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LUNG CANCER MARKER 

TECHNICAL FIELD 

The invention relates to genes and proteins specific 
for certain cancers and methods for their detection. 
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BACKGROUND OF THE INVENTION 

Lung cancer is the most common form of cancer in the 
world. Estimates for the year 1985 indicate that there 
were about 900,000 cases of lung cancer worldwide. 
5 (Parkin, et al., "Estimates of the worldwide incidence of 

eighteen major cancers in 1985," Jnt J Cancer 1993; 
54:594-606). For the United States alone, 1993 
projections placed the number of new lung cancer cases at 
170,000, with a mortality of about 88%. (Boring, et al., 

10 "Cancer statistics," CA Cancer J Clin 1993; 43:7-26). 

Although the occurrence of breast cancer is slightly more 
common in the United States, lung cancer is second behind 
prostate cancer for males and third behind breast and 
colorectal cancers for women. Yet, lung cancer is the 

15 most common cause of cancer deaths. 

The World Health Organization classifies lung cancer 
into four major histological types: (1) squamous cell 
carcinoma (SCC) , (2) adenocarcinoma, (3) large cell 
carcinoma, and (4) small cell lung carcinoma (SCLC) . (The 

20 World Health Organization, "The World Health Organization 

histological typing of lung tumours," Am J Clin Pathol 
1982; 77:123-136). However, there is a great deal of 
tumor heterogeneity even within the various subtypes, and 
it is not uncommon for lung cancer to have features of 

25 more than one morphologic subtype. The term non-small 

cell lung carcinoma (NSCLC) includes squsuaous, 
adenocarcinoma and large cell carcinomas. 

Typically, a combination of X-ray and sputum cytology 
is used to diagnose lung cancer. Unfortunately, by the 

30 time a patient seeks medical help for their symptoms, the 

cancer is at such an advanced state it is usually 
incurable. Cancer Facts and Figures (based on rates from 
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NCI SEER Program 1977-1981) , New York: American Cancer 
Society, 1986) . Routine large-scale radiologic or 
cytologic screening of smokers has been investigated. 
Studies concluded that cytoiuorphological screening did not 
5 significantly reduce the mortality rate from lung cancer 

and was not recommended for routine use. ("Early lung 
cancer detection: summary & conclusions," Am Rev Respir 
Dls 1984; 130:565-70). However, in a subpopulation of 
patients where the cancer is diagnosed at a very early 

10 stage and the lung is surgically resectioned, there is a 

5-year survival rate of 70-90%. (Flehinger, et al . , "The 
effect of surgical treatment on survival from early lung 
cancer," Chest; 1992, 101:1013-1018; Melamed, et al., 
"Screening for early lung cancer: results of the Memorial 

15 Sloan-Kettering Study in New York," Chest; 1984 86:44-53). 

Therefore, research has focused on early detection of 
tumor markers before the cancer becomes clinically 
apparent and while the cancer is still localized and 
amenable to therapy. 

20 The identification of antigens associated with lung 

cancer has stimulated considerable interest because of 
their use in screening, diagnosis, clinical management, 
and potential treatment of lung cancer. International 
workshops have attempted to classify the lung cancer 

25 antigens into 15 possible clusters that may define 

histologic origins. (Souhami, et al., "Antigens of lung 
cancer: results of the second international workshop on 
lung cancer antigens," JNCI 1991; 83:609-612). As of 
1988, more than 200 monoclonal antibodies (MAb) have been 

30 reported to react with human lung tumors. (Radosevich, et 

al., "Monoclonal antibody assays for lung cancer," In: 
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Cancer Diagnosis in Vitro Using Monocl onal Antibodies . 
Edited by H. A. Kupchik. New York: Marcel Dekker, 1988) . 

MAbs for lung cancer were first developed to 
distinguish NSCLC from SCLC. (Mulshine, et al., 
5 "Monoclonal antibodies that distinguish nonsmall-cell from 

small-cell lung cancer," c7 JjranunoJ 1983; 121:497-502). In 
most cases, the identity of the cell surface antigen with 
which a particular antibody reacts is not known, or has 
not been well characterized. (Scott, et al., "Early lung 

10 cancer detection using monoclonal antibodies, " In: Lung 

Cancer. Edited by J. A. Roth, J.D. Cox, and W.K. Hong. 
Boston: Blackwell Scientific Publications, 1993) . 

MAbs have been used in the imraunocytochemical 
staining of sputum samples to predict the progression of 

15 lung cancer. (Tockman, et al . , "Sensitive and specific 

monoclonal antibody recognition of human lung cancer 
antigen on preserved sputum cells: a new approach to early 
lung cancer detection," J CJin Oncol 1988; 6:1685-1693). 
In the study, two MAbs were utilized, 624H12 which binds a 

20 glycolipid antigen expressed in SCLC and 703D4 which is 

directed to a protein antigen of NSCLC. Of the sputum 
specimens from participants who progressed to lung cancer, 
two-thirds showed positive reactivity with either the SCLC 
or the NSCLC MAb. In contrast, of those that did not 

25 progress to lung cancer, 35 of 40 did not react with the 

SCLC or NSCLC Mab. This study suggests the need for the 
development of additional early detection targets to 
discover the onset of malignancy at the earliest possible 
stage . 

30 Carcinoembryonic antigen (CELA) is a frequently 

studied tumor marker of cancer including lung cancer. 
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{Nutini, et al., "Serum NSE, CEA, CT, CA 15-3 levels in 
human lung cancer," Int J Biol Markers 1990; 5:198-202). 
Squamous cell carcinoma antigen is another established 
serum marker. (Margolis, et al . , "Serum tumor markers in 
non-small cell lung cancer," Cancer 1994; 7 3:605-609.). 
Other serum antigens for lung cancer include antigens 
recognized by MAbs 5E8, 5C7, and IFIO, the combination of 
which distinguishes between patients with lung cancer from 
those without. (Schepart, et al., "Monoclonal antibody- 
mediated detection of lung cancer antigens in seriam, " An 
Rev Respir Dis 198B; 138:1434-8) Furthermore, the 
combination of 5E8, 5C7 and IFlO was more sensitive, 
specific and accurate for identifying NSCLC when compared 
to results from a combination of the CEA and squamous cell 
carcinoma antigen tests. (Margolis, et al . , Cancer 1994; 
73:605-609) . 

Serum CA 125, initially described as an ovarian 
cancer-associated antigen, has been investigated for its 
use as a prognostic factor in NSCLC. (Diez, et al., 
"Prognostic significance of serum CA 125 antigen assay in 
patients with non-small cell lung cancer," Cancer 1994; 
73:136876). The study determined that the preoperative 
serum level of CA 125 antigen is inversely correlated with 
survival and tumor relapse in NSCLC. 

Despite the numerous examples of MAb applications, 
none has yet emerged that has changed clinical practice. 
(Mulshine, et al., "Applications of monoclonal antibodies 
in the treatment of solid tumors," In: Biologic Therapy of 
Cancer. Edited by V.T. Devita, S. Hellman, and S.A. 
Rosenberg. Philadelphia: JB Lippincott, 1991, pp. 563- 
588) . MAbs alone may not be the answer to early detection 
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because there has only been moderate success with 
immunologic reagents for par af fin- embedded tissue. 
Secondly, lung cancer may express features that cannot be 
differentiated by antibodies; for example, chromosomal 
5 deletions, gene amplification, or translocation and 

alteration in enzymatic activity. 

After the gene to the MAb recognized surface antigen 
has been cloned, cytogenetic and molecular techniques may 
provide powerful tools for screening, diagnosis, 

10 management and ultimately treatment of lung cancer- An 

example of a lung cancer antigen that has been cloned is 
the adenocarcinoma-associated antigen. This antigen, 
recognized by KSl/4 MAb, is an epithelial 
malignancy/epithelial tissue glycoprotein from the human 

15 lung adenocarcinoma cell line UCLA-P3 . (Strand, et al., 

"Molecular cloning and characterization of a human 
adenocarcinoma/epithelial cell surface antigen 
complementary DNA, " Cancer Res 1989; 49:314-317). The 
antigen has been found on all adenocarcinoma cells tested 

20 and in various corresponding normal epithelial cells. 

Northern blot analysis indicated that transcription of the 
adenocarcinoma-associated antigen was detected in RNA 
isolated from normal colon but not in RNA isolated from 
normal lung, prostate, or liver. Therefore identification 

25 of adenocarcinoma-associated antigen in lung cells may 

prove to be diagnostic for adenocarcinoma. 

The cloning of CEA and the nonspecific crossreacting 
antigen (NCA) has allowed the development of specific DNA 
probes which discriminate their expression in lung cancer 

30 at the mRNA level. (Hasegawa, et al . , "Nonspecific 

crossreacting antigen (NCA) is a major member of the CEA- 
related gene family expressed in lung cancer, " Br J Cancer 
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1993; 67:58-65). NCA is a component of the CEA gene 
family in lung cancer and is also recognized by anti-CElA 
antibodies, especially polyclonal antibodies. Because of 
the crossreactivity, investigations to analyze CEA and NCA 
5 separately in lung disease had been difficult. The use of 

DNA probes determined that lung cancer cells fall into 
three different types according to their CEA and/or NCA 
expression by Northern blot analysis. Specifically/ lung 
cancers expressed both CEA and NCA mRNA, only NCA mRNA, or 

10 neither mRNA. CEA-related mRNA expression was always 

accompanied by NCA mRNA expression and there were no cases 
of CEA mRNA expression alone. The separate assessment of 
CEA and NCA expression in lung cancers may be important in 
determining the prognosis of lung cancers because the 

15 antigens have been described as cell-cell adhesion 

molecules and may play a role in cancer metastasis. 

Another method to detect the presence of an antigen 
gene or its mRNA in specific cells or to localize an 
antigen gene to a specific locus on a chromosome is in 

20 5itu hybridization. In situ hybridization uses nucleic 

acid probes that recognize either repetitive sequences on 
a chromosome or sequences along the whole chromosome 
length or chromosome segments. By tagging the probes with 
radioisotopes or color detection systems, chromosome 

25 regions can be identified within the cell. Investigations 

using in situ hybridization have demonstrated nxmerical 
chromosomal abnormalities in samples from human tumors, 
including bladder, neuroectodermal, breast, gastric and 
lung cancer tximors. (Kim, et al . , "Interphase cytogenetics 

30 in paraffin sections of lung tumors by non-isotopic in 

situ hybridization. Mapping Genotype/phenotype 
heterogeneity," Am J Pathol 1993; 142:307-317). 
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Fluorescence in situ hybridization (FISH) allows 
cells to be stained so that genetic aberrations resulting 
in changes in gene copy number or structure can be 
quantitated by fluorescence microscopy. In this 
5 technique, a chemically labeled single-stranded nucleic 

acid probe homologous to the target nucleic acid sequence 
is annealed to denatured nucleic acid contained in target 
cells. The cells may be mounted on a microscope slide, in 
suspension or prepared from paraffin-embedded material . 

10 Treating the chemically modified probes with a fluorescent 

ligand makes the bound probe visible. FISH has been used 
for (1) detection of changes in gene copy number and gene 
structure; (2) detection of genetic changes, even in low 
frequency subpopulations ; and (3) detection and 

15 measurement of the frequency of residual malignant cells. 

(Gray, et al., "Molecular cytogenetics in human cancer 
diagnosis," Cancer 1992; 69:1536-1542). 

Other molecular markers for lung cancer include 
oncogenes and tumor suppressor genes . Dominant oncogenes 

20 are activated by mutation and lead to deregulated cellular 

growth. Such genes code for proteins that function as 
growth factors, growth factor receptors, signal 
transducing proteins and nuclear proteins involved in 
transcriptional regulation. Amplification, mutation, and 

25 translocations have been documented in many different 

cancer cells and have been shown to lead to gene 
activation or overexpression . 

The ras family of oncogenes comprises a group of 
membrane associated GTP-binding proteins thought to be 

30 involved in signal transduction. Mutations within the ras 

oncogenes, resulting in sustained growth stimulation, have 
been identified in 15 to 30% of human NSCLC. (Birrer, et 



wo 96/02552 



PCT/US95rt)9J45 



al . , "Application of molecular genetics to the early 
diagnosis and screening of lung cancer," Cancer 1992; 
52suppl; 2658s-2664s) . Patients with tumors containing 
ras mutations had decreased survival compared with 
5 patients whose tumors had no ras mutations. Polymerase 

chain reaction (PGR) amplification of ras genes can be 
analyzed to determine the presence of mutations by several 
methods: (a) differential hybridization of "P-labeled 
mutated oligonucleotides; (b) identification of new 

10 restriction enzyme sites created by the activating 

mutation; (c) single-strand conformational polymorphisms; 
and (d) nucleic acid sequencing. These methods combined 
with PGR technology could allow detection of an activated 
ras gene from sputum specimens. 

15 Another family of dominant oncogenes, the erb B 

family, has been found to be abnormally expressed in lung 
cancer cells. This group codes for membrane-associated 
tyrosine kinase proteins and contains erb Bl, the gene 
coding for the epidermal growth factor (EGF) receptor, and 

20 erb B2 (also called Her-2/neu) . The erb Bl gene has been 

found to be amplified in NSCLC (up to 20% of squamous cell 
tumors) , while the EGF receptor has been shown to be 
overexpressed in many NSCLC cells (approximately 90% of 
squamous cell tumors, 20 to 75% of adenocarcinomas, and 

25 rarely in large cell or undifferentiated tumors) . 

(Birrer, et al . , Cancer 1992: 52 suppl; 2658s-2664s) . 
Amplification of the related oncogene erb B2 {Her-2/neu) 
occurs infrequently in lung cancer but is a negative 
prognostic factor in breast cancer. However, 

30 overexpression of the erb B2 protein product, pl85"'", 

occurs in some NSCLC and may be related to poor prognosis. 
(Kern, et al., "pl85"'" expression in human lung 
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adenocarcinomas predicts shortened survival, " Cancer Res 
1990; 50:5184-5191) . 

A third family of dominant oncogenes involved in lung 
cancer is the myc family. These genes encode nuclear 
5 phosphoproteins, which have potent effects on cell growth 

and which function as transcriptional regulators. Unlike 
ras genes, which are activated by point mutations in lung 
cancer cells, the myc genes are activated by 
overexpression of the cellular myc genes, either by gene 

10 amplification or by rearrangements, each ultimately 

leading to increased levels of myc protein. Amplification 
of the normal myc genes is seen frequently in SCLC and 
rarely in NSCLC. 

The loss or inactivation of tumor suppressor genes 

15 may also be important steps in the pathway leading to 

invasive cancer. Tumor suppressor genes function normally 
to suppress cellular proliferation, and since they are 
recessive oncogenes, mutations or deletions must occur in 
both alleles of these genes before transformation occurs. 

20 A phosphoprotein p53, which is encoded by a gene 

located on chromosome lip, suppresses transformation in 
its wild-type state. While in its mutant state, p53 acts 
as a dominant oncogene. p53 functions in DNA binding and 
transcription activation. Mutations of p53 have been 

25 found in many human cancers including colon, breast, brain 

and lung cancer cells. (Birrer, et al., Cancer 

Res.(suppl) 1992, 52 : 2658s-2664s ) . In NSCLC cell lines, 

p53 mutations have been found at a rate of up to 74%. 
(Mitsudomi, et al . , "p53 gene mutations in non-small-cell 

30 lung cancer cell lines and their correlation with the 
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presence of ras mutations and clinical features, " Oncogene 
1992; 7:171-180) . 

Despite all of the advances made in the area of lung 
cancer, medical and surgical intervention has resulted in 
little change in the 5-year survival rate for lung cancer 
patients. Early detection holds the greatest hope for 
successful intervention. There remains a need for a 
practical method to diagnose lung cancer as close to its 
inception as possible. In order for early detection to be 
feasible, it is important that specific markers be found 
and their sequences elucidated. 

A lung cancer marker antigen, specific for NSCLC, has 
now been found, sequenced, and cloned. The antigen is 
useful in methods for detection of non-small cell lung 
cancer and for potential production of antibodies and 
probes for treatment compositions. 
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BRIEF DESCRIPTION OF THE DRAWING 

FIGURE 1 depicts the alignment of the amino acid 
sequence of HCAVIII with previously described carbonic 
anhydrases. Conserved amino acids are shown in bold. 
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SUMMARY OF THE INVENTION 

The invention concerns a lung cancer antigen 
(HCAVIII) gene specific for non-small cell lung cancer. 

In one embodiment, the invention relates to a 
substantially purified nucleic acid (SEQ ID N0:1) encoding 
the pre-protein sequence shown in SEQ ID NO: 2. 

In other embodiments, the invention relates to cDNAs 
which encode the mature form of the protein (SEQ ID N0:4), 
or a truncated form of the protein lacking the 
transmembrane domain (SEQ ID N0:13 and SEQ ID NO:15), or a 
protein in which one or more of the amino acids in the 
phosphorylation region have been altered to affect that 
function, an example of which is shown in SEQ ID NO: 18. 

In other embodiments, proteins encoded by the cDNA of 
SEQ ID NO:l, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 12, SEQ 
ID NO: 14, and SEQ ID NO: 17 are provided. 

In another aspect, the invention relates to a 
recombinant DNA clone for HCAVIII. 

In further aspects of the invention, expression 
vectors for HCAVIII and modifications thereof are an 
object . 

The invention further relates to methods of detecting 
lung cancer. 

In one aspect an in situ hybridization technique is 
provided. In another aspect, a fluorescence in situ 
hybridization technique is provided. In a further aspect, 
an ELISA assay is provided. In another aspect, detection 
of carbonic anhydrase activity which correlates with lung 
cancer antigen is provided. 
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DETAILED DESCRIPTION OF THE INVENTION 

The nucleic acid sequence coding for a cell surface 
protein (said protein hereinafter designated HCAVIII) 
which is highly specific for non-small cell lung cancer 
5 cells has now been obtained. This gene sequence will 

facilitate detection and treatment of the disease, which 
to date has often proven difficult. 

The HCAVIII cDNA in the vector pLC56 has been 
sequenced and characterized including the entire coding 

10 region and substantially all of the upstream and 

downstream non-translated regions. The cDNA in pLC56 was 
secpaenced on both strands from exonuclease Ill-generated 
deletions and subsecjuent subcloning into M13 vectors or 
directly from the cloning vectors using the di-deoxy 

15 method and a SEQUENASE ® Version 2.0 kit (U.S. 

BiochemicalS/ Cleveland, OH) . Additional regions of DNA 
were subcloned as small restriction fragments into the 
same vectors for sequence analysis. Overlapping segments 
were ordered using MacVector Align software (Kodak/IBI 

20 Technologies, New Haven CT) . SEQ ID N0:1 represents the 

cDNA encoding HCAVIII and a presumed signal peptide. SEQ 
ID NO: 2 represents the signal peptide (amino acid residues 
-29 to -1) followed by the mature protein (amino acid 
residues 1 to 325) . As predicted from the cDNA sequence 

25 in pLC56, a protein of about 354 amino acids is encoded 

with the predictive size of 39448 daltons. A 
hydrophilicity plot (MacVector software, Koda)c/IBI 
Technologies) of this protein provided strong evidence oT" " 
a leader peptide at the N-terminus and a membrane-spanning 

30 segment near the C-terminus . The membrane- spanning 

segment provides evidence that this protein is membrane 
bound, as also predicted by its positive selection with 
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panning methodology (See Watson, et al . , Recombinant DMA, 
2nd ed., pp. 115-116, 1992). The cleavage site of the 
signal as predicted by von Heijne (von Heijne, Gunnar, 
Nucleic Acids Res 1986; 14:4683-4690) is 29 amino acids 
down from the N-terminus methionine. SEQ ID NO: 3 
corresponds approximately to the coding region of the 
mature polypeptide. The subsequent "mature" protein is 
proposed to be 325 amino acids, initiating with serine, 
and of a calculated 36401 daltons and a pi of 6.42 (SEQ ID 
NO : 4 ) . 

Homology searches against NCBI BlastN or BlastX 
version 1.3.12MP (National Center for Biotechnology 
Information, Bethesda, MD) provided evidence the gene and 
protein are novel, not previously identified in either 
database. (Altschul, et al., "Basic local alignment 
search tool," J Mol Biol 1990; 215:403-410). Additional 
searches against another database (Entrez, version 9) gave 
similar results. 

The isolation of a second cDNA encoding RCAVUl 
permitted the identification of new sequences within the 
5 '-and 3 '-prime untranslated regions of this gene. SEQ ID 
NO: 5, a cDNA encoding HCAVIII and a portion of the 5' and 
3' nontranslated regions, has substantial identity with 
SEQ ID N0:1 (positions 1-1104 of SEQ ID N0:1 are identical 
to positions 85-1188 of SEQ ID NO: 5). The encoded protein 
is listed in SEQ ID NO: 6 and is identical with SEQ ID 
NO: 2. Homology searches of NCBI BlastN against SEQ ID 
NO: 5 showed these gene sequences have not been previously 
identified. SEQ ID NO: 7 represents additional cDNA 
sequences of the 3' nontranslated region of the HCAVIII 
gene located downstream from the sequences depicted in SEQ 
ID NO: 5. Homology searches against the same data base 
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identified two clones with homology to SEQ ID NO: 7. Both 
sequences are expressed sequence tags (EST) , the first 
EST04899 (345 bp) and the second HUMGS04024 (466 bp). 
Alignment searches indicate this protein shares 
5 common features with the seven human carbonic anhydrase 

proteins previously identified. However, as described 
below, certain structural features distinct to HCAVIII 
exist that may confer unique properties to this protein 
and a role in the transformation pathway to tumorgenicity . 
10 This group of enzymes catalyze the hydration of carbon 

dioxide 

CO^ + H2O « HCO3 + H* 

and in reverse the dehydration of HCOj'. This protein is 
identified as a carbonic anhydrase (CA) based on the 

15 conservation of amino acids at positions critical for the 

binding of Zn*^, and the catalysis of CO2, as well as 
numerous other conserved amino acids (see Fig. 1). The 
protein is 34 to 64 amino acids longer (at the C-terminus) 
than any previously reported carbonic anhydrase by virtue 

20 of the membrane- spanning region also found in HCAIV and an 

additional approximate 30 amino acids contained in the 
cytoplasmic side of the cell and apparently missing in 
other human CA isoforms. In addition, this intracellular 
domain contains a phosphorylation site recognized by 

25 protein kinase C and other kinases, as defined by the 

motif "Arg-Arg-Lys-Ser" (SEQ ID NO: 8 and SEQ ID NO: 9) 
(amino acid residues 1-4 in SEQ ID NO: 9 and amino acid 
residues 299-302 in SEQ ID NO: 2, SEQ ID NO: 4 and SEQ ID 
NO: 6) . 
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Interestingly, this motif is found only m HCAVIII, 
and at a functionally significant site, i.e., within the 
cytosol. A surface cleft essential for enzymatic function 
present on other carbonic anhydrases is conserved for this 
protein, suggesting that this protein will also confer 
enzymatic activity. Five possible N-glycosylation sites 
are predicted by the primary amino acid sequence and the 
motif "Asn-Xaa-Ser (Thr)", beginning at amino acid 
residues -2, 51, 133, 151, and 202 in SEQ ID N0:2, 
respectively. 

HCAVIII is expressed at a much higher level in a non- 
small cell lung cancer cell line (A549) than in normal 
lung tissue, other normal tissues, and other tumor cell 
lines which makes it useful in distinguishing this 
disease. This is clearly demonstrated in Table 1. Data 
for this table was obtained as follows. Total cellular 
RNA was isolated from the indicated actively growing cell 
lines as described by Chirgwin, et al . , "Isolation of 
biologically active ribonucleic acid from sources enriched 
in ribonuclease, " Biochemistry 1979; 18:5294-5299. RNA 
samples were fractionated over a 1% agarose- formaldehyde 
gel and transferred to a nylon membrane (Qiagen, 
Chatsworth, CA) by capillary action. The hybridization 
probe was generated from a 1 kilobase pair BstXI 
restriction fragment isolated from pLC56, a plasmid 
harboring the HCAVIII gene in its initial isolation. This 
fragment was radiolabeled with ^^P using a PRIME-IT® 
Random Primer Labeling Kit obtained from Stratagene, La 
Jolla, CA. A membrane containing RNA derived from healthy 
human tissue was purchased from Clonetech Laboratories, 
Inc., Palo Alto, CA. RNA blots were hybridized in a 
standard cocktail containing ^^P-labeled probe at 42''C 



wo 96/02552 



PCTA]S95/09145 



18 

overnight then exposed to X-ray film. The same blots were 
subsequently, upon removal of the probe, rehybridi zed with 
a second ^^P-labeled DNA from ^-actin to serve as a 
positive control for integrity of the blotted RNA. 

As shown in Table 1, normal lung tissue does not 
express the HCAVIII gene in detectable amounts. Other 
tumor cell lines fail to express, or express only in minor 
amounts, which will allow easy distinction of non-small 
cell carcinomas. 



wo 96/02552 



PCTAJS95jD9145 



19 



TABLE 1. NORTHERN BLOTS USING HCAVIII cDNA AGAINST NORMAL 
TISSUES AND TUMOR CELL LINES 

TISSUE mRNA (kB) INTENSITY 



NORMAL TISSUE 

heart 

brain 

placenta 

lung 

liver 

skeletal muscle 

kidney 

pancreas 



nd^ 

4.5 

4.5 

nd 

nd 

nd 

4.5 

4.5 



IX^ 
IX 



lOOX 
lOX 



TUMOR CELL LINE 

A549 (lung carcinoma) 



BT20 (breast carcinoma) 

G361 (melanoma) 

HT144 (melanoma) 

U937 (histiocytic lymphoma) 



3.5 

5.4 

8.0 

9.0 

nd 

nd 

nd 

nd 



5000X 
SOX 
25X 
25X 



KG-1 (myelogenous leukemia) 



nd = none detected 

IX = at limit of detection 
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In one embodiment of the invention, probes are made 
corresponding to sequences of the cDNA shown in SEQ ID 
NO: 3, which are complimentary to the mRNA for HCAVIII. 
These probes can be radioacti vely or non-radioactively 
5 labeled in a number of ways well known to the art. The 

probes can be made of various lengths. Such factors as 
stringency and GC content may influence the desired probe 
length for particular applications. The probes correspond 
to a length of 10-986 nucleotides from SEQ ID NO: 3. The 

10 labeled probes can then be bound to detect the presence or 

absence of mRNA encoding the HCAVIII in biopsy material 
through in situ hybridization. The mRNA is expected to be 
associated with the presence of non-small cell tumors and 
to be a marker for the precancerous condition as well. 

15 In sitv hybridization provides a specificity to the 

target tissue that is not obtainable in Northern, PGR or 
other probe-driven technologies. In situ hybridization 
permits localization of signal in mixed-tissue specimens 
commonly found in most tumors and is compatible with many 

20 histologic staining procedures. This technique is 

comprised of three basic components: first is the 
preparation of the tissue sample provided by the 
pathologist to permit successful hybridization to the 
probe. Second is the preparation of the hybridization 

25 probe, typically a RNA complementary to the mRNA of the 

gene of interest (i.e., antisense RNA) . RNA probes are 
preferred over DNA probes for in situ hybridizations 
mainly because background hybridization of the probe to 
irrelevant nucleic acids or nonspecific attachment to cell 

30 debris or subcellular organelles can be eliminated with 

RNAse treatment post-hybridization. Third is the 
hybridization and post-hybridization detection. Typically 
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the RNA transcript probe has been radiolabeled by the 
incorporation of "P or ^^S nucleotides to permit 
subsequent detection of the probed specimen by 
autoradiography or quantitation of silver grains following 
5 treatment with autoradiographic emulsion. Nonradioactive 

detection systems have also been developed. In one 
excimple, biotinylated nucleotides can be substituted for 
the radioactive nucleotide in the RNA probe preparation, 
permitting visualization of the probed sample by 
10 immunocytochemistry-derived techniques. Example 1 

describes in situ hybridization procedures using RNA 
probes derived from the HCAVIII gene. Example 2 provides 
exemplary fluorescent in situ (FISH) hybridization 
procedures . 

15 The cDNA for HCAVIII (SEQ ID NO: 3) is currently in an 

expression vector which is be used to generate the protein 
in E. coli . This expression system described in Example 3 
produces HCAVIII to be used as an antigen for the 
generation of antibodies (Example 4) for use in an ELISA 

20 assay to detect shed HCAVIII in body fluids as described 

in Example 5. The methods for production of antibodies 
and ELISA type assays are well known in the art. 
Exemplary methods and components of these procedures have 
been chosen and developed and are described in Examples 4 

25 and 5. 

The expression and purification of foreign proteins 
in E. coli is often problematic. On occasion, the protein 
is expressed at high levels but is deposited within the 
cell as an insoluble, denatured form termed an inclusion 

30 body. These bodies are often observed when the foreign 

protein contains a hydrophobic domain, such as found in 
the membrane spanning segment of HCAVIII. Through 
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recombinant DNA technology, the DNA sequences encoding the 
membrane spanning segment of HCAVIII are deleted. The 
protein expressed in E. coli from this engineered plasmid 
is now in a soluble and native form within the cell, 
5 permitting a rapid and less harsh purification. In 

addition, the ELISA test to measure HCAVIII shed into body 
fluids as described in Example 5 relies on the recombinant 
protein produced from E. coli. Typically, the shed 
antigen is a membrane-bound receptor that was released 

10 from the membrane spanning segment anchoring it to the 

cell. Consequently, the recombinant HCAVIII engineered to 
remove the membrane spanning segment is a more accurate 
representation of the putative HCAVIII shed antigen found 
in specimens and may prove to be the preferred antigen for 

15 polyclonal antisera and monoclonal antibody production as 

described for the development of an ELISA test. 

To produce the engineered plasmid, a first plasmid is 
constructed by cleaving pLC56 with the restriction enzyme 
Tthlll I, followed by treatment with T^-DNA polymerase and 

20 dGTP, dATP, dTTP and dCTP, and finally with alkaline 

phosphatase to remove 5 '-terminal phosphates. The DNA 
sample is then purified by phenol/chloroform extraction 
and ethanol precipitation. The sample is digested with 
the restriction endonuclease BspEl, then the fragments are 

25 resolved by agarose gel electrophoresis to permit the 

isolation of a 267 base pair fragment. A second plasmid 
described previously for expression of the HCAVIII mature 
protein (SEQ ID N0:4), is cleaved with EcoRI and BspEl 
followed by alkaline phosphatase treatment and 

30 purification by phenol/chloroform extraction and ethanol 

precipitation. Two oligonucleotides are synthesized, 
being 5 • -TGAGTCGACG (SEQ ID NO: 10) and 5 ' -AATTCGTCGACTCA 
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(SEQ ID N0:11), that complement each other and upon 
annealing, provide a termination codon (TGA) and sequence 
complementary to EcoRI cleaved DNA. Finally, the two 
oligonucleotides, the 267 base pair fragment, and the 
BspEI/EcoRI cleaved plasmid will be combined in a ligation 
reaction, and the resultant plasmid which contains the 
truncated DNA sequence (SEQ ID NO: 12) is used to transform 
competent E. coli. Upon expression in E. coli, the 
resulting truncated protein (SEQ ID NO: 13) is 271 amino 
acids as determined by SDS polyacrylamide electrophoresis 
and of a size consistent with other HCA's but lacking the 
membrane spanning segment and the intracellular domain. A 
second plasmid encoding a HCAVIII truncated protein (SEQ 
ID NO: 14) lacking the membrane spanning segment and 
intracellular domain was created as described above, 
except that restriction enzyme Pie I was substituted for 
Tthlll I, resulting in a gel purified DNA fragment of 276 
base pairs. Upon expression in E.coli, the resulting 
protein is now 274 amino acids (SEQ ID NO: 15) . 

An understanding of protein phosphorylation and its 
role in the mechanism of cell transformation has been 
actively pursued, most notably with tyrosine 
phosphorylation and oncogene activation. The role of 
serine/threonine protein phosphorylation by a variety of 
protein kinases including protein kinase C has been 
studied extensively with respect to signal transduction, 
but its role in oncogenesis is less clear. To provide a 
valuable tool to be used in the study of the role of 
HCAVIII serine phosphorylation in oncogenesis, an altered 
cDNA can be prepared to code for an altered protein. 
Changes to amino acids other than "Gly" may be realized by 
alterations to the oligonucleotide sequence (SEQ ID NO: 16) 
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used to encode the selected residue. Other modifications 
to alter the serine phosphorylation site would utilize the 
described technology to modify either both "Arg" residues 
located within SEQ ID NO: 9 or amino acid residues 299 and 
300 of SEQ ID N0:2, SEQ ID NO: 4 and SEQ ID NO: 6. Since 
"Arg" residues contain a net positive charge, the 
substituted amino acids would preferably be "Lys" or 
"His," also positively charged amino acids. An exemplary 
plasmid is produced in which the "Ser" codon (amino acid 
residue 4 of SEQ ID NO: 9; amino acid residue 302 in SEQ ID 
NO:2, SEQ ID NO: 4 and SEQ ID NO: 6) , is converted to a 
"Gly" codon using an in vitro mutagenesis technique 
described in Example 3 and previously recited in Kunkel, 
Thomas, "Rapid and efficient site-specific mutagenesis 
without phenotypic selection, " Proc Natl Acad Sci USA 
1985; 82:488-492, and the oligonucleotide 5'- 
CTTTTTTGATACCCTTCCTTCTGAA (SEQ ID NO: 16) (located in SEQ 
ID N0:1 at the base pairs 1010-1034 with 1022 as the 
mutagenized base pair) . The DNA sequences containing the 
HCAVIII gene engineered for production of the mature 
protein and mutagenized codon is released from the 
mutagenesis vector by BamHI and EcoRI restriction 
endonucleases and ligated into pGEX4Tl cleaved with the 
same enzymes, and the resultant plasmid is used to 
transform competent E. coli. The codon mutagenesis is 
confirmed by DNA sequence analysis, and the protein is 
expressed and purified from E. coli as described in 
Example 3. The DNA sequence of the altered plasmid as 
shown in SEQ ID NO: 17 differs from the gene encoding the 
mature protein (SEQ ID NO: 3) in that the nucleotide 1022 
is changed from "A" to "G", and the protein sequence (SEQ 
ID NO: 18) expressed by the altered plasmid is identical to 
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the mature protein (SEQ ID NO: 4) except that amino acid 
residue 302 is changed from "Ser" to "Gly." 

Another way to detect the presence of increased 
HCAVIII could be to assay for levels of carbonic anhydrase 
activity in biopsy materials as described in Example 6. 
This should be a useful test as HCAVIII, although it is an 
immunologically unique molecule, contains small but 
distinct regions which are conserved between previously 
reported carbonic anhydrase proteins. 

In another embodiment of the invention, primers are 
made complimentary to the HCAVIII cDNA (SEQ ID NO: 3) for 
detecting expression of the gene. PCR amplification of 
cDNA from lung biopsy cells would indicate the presence of 
the same non-small cell lung carcinoma. 

Due to the non-small cell lung cancer specificity of 
HCAVIII and the gene encoding the protein, antibodies 
specific for HCAVIII would also exhibit non-small cell 
lung cancer specificity which can be employed for 
diagnostic detection of HCAVIII in body fluids such as 
serum or urine or HCAVIII containing cells. Targeting of 
cancer therapeutic drugs to HCAVIII containing cells can 
also be developed using HCAVIII specific antibodies. The 
genetic expression of the gene encoding HCAVIII could be 
modulated by drugs or anti-sense technology resulting in 
an alteration of the cancer state of the HCAVIII 
containing cells. 

Example 1 

In Situ Hybridi-zation using RNA Probes 
Derived from the HCAVIII Gene 
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Tissue samples are treated with A% paraformaldehyde 
(or equivalent fixative) , dehydrated in sequential ethanol 
solutions of increasing concentrations (e.g., 70%, 95% and 
100%) with a final xylene incubation (see Current 
5 Protocols in Molecular Biology, pp. 14.01-14.3 and 

Immunocytochemistry IltlBRO Handbook Series: Methods in 
the Nevrosciences Vol 14; pp 281-300, incorporated herein 
by reference) . The tissue is embedded in molten paraffin, 
molded in a casting block and can be stored at room 

10 temperature. Tissue slices, typically 8 ym thick, are 

prepared with a microtome, dried onto gelatin-treated 
glass slides and stored at -20°C. 

DNA sequences from the HCAVIII gene {SEQ ID NO: 3) are 
subcloned into a plasmid engineered for production of RNA 

15 probes. In this example, a 776 bp DNA fragment is 

released from a pLC56 plasmid following BamHI/AccI 
digestion, where the BamHI site has been created by in. 
vitro mutagenesis (see E. coli expression below) . This 
fragment is ligated into pGEM-2 (Promega Biotec, Madison, 

20 WI) that was cleaved with BamHI and AccI and transformed 

into competent E. coli. This constructed plasmid contains 
the T7 RNA polymerase promoter downstream of the AccI 
restriction site and hence can drive transcription of the 
antisense HCAVIII sequences defined by the BamHI /AccI 

25 fragment. Following linearization of the subsequent 

plasmid with BamHI, an in vitro transcription reaction 
composed of transcription buffer (40 mM Tris-HCl, pH 7.5, 
6 mM MgCl2, 2 mM spermidine, 10 mM NaCl, 10 mM 
dithiothreitol , 1 U/ul ribonuclease inhibitor), linearized 

30 plasmid, 10 mM GTP, 10 mM ATP, 10 mM CTP, 100 yCi of 

(^^S)UTP, and T7 RNA polymerase is incubated at 37 "C. 
Multiple RNA copies of the gene are produced that then are 
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used as a hybridization probe. The reaction is terminated 
by the addition of DNAase, and the synthesized RNA is 
recovered from unincorporated nucleotides by 
phenol/chloroform extraction and sequential ethanol 
precipitations in the presence of 2.5 M airanonium acetate. 

The slides containing fixed, sectioned tissues are 
rehydrated in decreasing concentration of ethanol (100%, 
70% and 50%), followed by sequential treatments with 0.2 N 
HCl, 2X SSC (where 20X SSC is 3 M NaCl and 0.3 M sodium 
citrate) at 70''C to deparaf f inate the sample , phosphate 
buffered saline (PBS), fixation in 4% paraformaldehyde and 
PBS wash. The slides are blocked to prevent nonspecific 
binding by the sequential additions of PBS/lOmM 
dithiothreitol {45''C) , 10 mM dithiothreitol/0.19% 
iodoacetamide/0.12% N-ethylmaleimide and PBS wash. The 
slides are equilibrated in O.IM triethylamine, pH 8.0, 
followed by treatment in O.IM triethylamine/0 .25% acetic 
anhydride and 0.1 M triethylamine/0 . 5% acetic anhydride 
and washed in 2X SSC. The slides are then dehydrated in 
increasing concentrations of ethanol (50%, 70% and 100%) 
and stored at -80°C. 

A hybridization mix is prepared by combining 50% 
deionized formamide, 0.3 M NaCl, 10 mM Tris-HCl, pH 8.0, 1 
mM EDTA, IX Denhardfs solution (0.02% Ficoll 400, 0.02% 
polyvinylpyrrolidone, 0.02% bovine serum albumin (BSA) ) , 
500 ug/ml yeast tRNA, 500 pg/ml poly (A), 50 mM 
dithiothreitol, 10% polyethyleneglycol 6000 and the "S- 
labeled RNA probe. This solution is placed on the fixed, 
blocked tissue slides which are then incubated at 45''C in 
a moist chamber for 0.5 to 3 hours. The slides are washed 
to remove unbound probe in 50% formamide, 2X SSC, 20 mM 2- 
mercaptoethanol (55°C), followed by 50% formamide, 2X SSC, 
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20 mM 2-inercaptoethanol and 0.5% Triton-X 100 (50°C) and 
finally in 2X SSC/20 mM 2-mercaptoethanol (room 
temperature) . The slides are treated with 10 mM Tris-HCl, 
pH 8.0/0.3 M NaCl/40 ug/ml RNase A/2 ng/ml RNAse Tl {31°C} 
5 to reduce levels of unbound RNA probe. Following RNAse 

treatment, the slides are washed in f ormamide/SSC buffers 
at SCC, room temperature and then dehydrated in 
increasing ethanol concentrations containing 0.3 M 
ammonium acetate, and one final 100% ethanol wash. The 

10 slides are then exposed to X-ray film followed by emulsion 

autoradiography to detect silver grains. 

Test tissue samples are compared to matched controls 
derived from normal lung tissue. Evidence of elevated 
transcription of the HCAVIII gene in test tissue compared 

15 to normal tissue, as determined by autoradiography (X-ray 

film) or alternatively by the quantitation of silver 
grains following emulsion autoradiography would provide 
evidence of a positive diagnosis for lung cancer. 

20 Exanple 2 

Fluorescent In Situ Hybridization (FISH) Using DMA Probes 
Derived from the HCAVIII Gene 

A genomic clone to the HCAVIII gene (SEQ ID N0:1) is 
isolated using a PGR primer pair which have been 
25 identified from the pLC56 cDNA sequence. This primer pair 

is located in putative exon 6 of the pLC56 gene, and they 
are identified as Probe Exon 6A ( 5 ' -ACATTGAAGAGCTGCTTCCGG- 
3'; SEQ ID NO: 19) and Probe Exon 6B (5'- 

AATTTGCACGGGGTTTCGG-3 • ; SEQ ID NO:20). The genomic clone 
30 of HCAVIII is then identified as a PCR product of about 

119 bp using this primer pair from the designated genomic 
clone. This result is confirmed by Southern blotting and 
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DNA sequence analysis. A sequence of 1363 bp derived from 
the HCAVIII genomic clone is reported in SEQ ID NO: 21. 
This sequence is located directly before the HCAVIII cDNA 
and constitutes the putative promoter of this gene and 
likely contains transcription regulatory elements directly 
implicated in HCAVIII expression. 

The DNA probe comprising the genomic clone of HCAVIII 
plus flanking sequences is labeled in a random primer 
reaction with digoxigenin-ll-dUTP (Boehringer Mannheim 
Biochemicals, Indianapolis, IN) by combining the DNA with 
dNTP(-TTP, final 0.05 mM), digoxigenin-ll-dUTP/dTTP 
(0.0125 mM and 0.0375 mM, final), 10 mM 2-mercaptoethanol , 
50 mM Tris-HCl, pH 7.5, 10 mM MgClj, 20 U of DNA 
polymerase I and 1 ng/ml DNAase. The reaction is 
incubated at 15''C for two hours, and then terminated by 
adding EDTA to a final concentration of 10 mM. The 
labeled DNA probe is further purified by gel filtration 
chromatography. It is apparent to those skilled in the 
art that other suitable substrates such as biotin-1 1-dUTP 
can be substituted for digoxigenin-ll-dUTP in the 
procedure above. 

A hybridization mix is prepared by combining 50% 
deionized formamide, 0.3 M NaCl, 10 mM Tris-HCl, pH 8.0, 1 
mM EDTA, IX Denhardt ' s solution (0.02% Ficoll 400, 0.02% 
polyvinylpyrrolidone, and 0.02% bovine serxom albumin), 500 
Mg/ml yeast tRNA, 500 Mg/ml poly(A), 50 mM dithiothreitol , 
10% polyethyleneglycol 6000, and the labeled DNA probe. 

Single cell suspensions of tissue biopsy material or 
normal tissue are fixed in methanol/glacial acetic acid 
(3:1 vol/vol) and dropped onto microscope slides. 
(Aanastasi, et al . , "Detection of Trisomy 12 in chronic 
lymphocytic leukemia by fluorescence in situ hybridization 
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to interphase cells: a simple and sensitive method," Blood 
1992; 77:2456-2462). After the slides are heated for 1-2 
hours at eCc, the hybridization mix is applied to the 
slides which are then incubated at 45°C in a moist chaitiber 
for 0.5-3 hours. After incubation, the slides are washed 
three times with a solution comprising 50% formamide and 
2X SSC at 42°C, washed twice in 2X SSC at 42°C, and 
finally washed in 4X SSC at room temperature. The slide 
is blocked with a solution of 4X SSC and 1% BSA, and then 
washed with a solution of 4X SSC and 1% Triton X-100. 

The hybridized digoxigenin-labeled probe is detected 
by adding a mixture of sheep anti-digoxigenin antibody 
(Boehringer Mannheim) diluted in 0.1 M sodium phosphate, 
pH 8.C, 5% nonfat dry milk, and 0.02% sodium azide, 
followed by the addition of f luorescein-conjugated rabbit 
anti-sheep IG for detection. The slides are then washed 
in PBS, mounted in Vectashield (Vector Laboratories, Inc., 
Burlingame, CA) , and viewed by fluorescent microscopy. 

Hybridization signals are enumerated in tumor derived 
tissue and then compared to normal tissue. Normal tissue 
displays two distinct hybridization signal characteristics 
of a diploid state. Enumeration over the rate of two 
hybridization signals/cell is considered significant. 

Example 3 
Ejqsrossion of HCAVIII 

Expression of foreign proteins is often performed in 
E. coli when an immunogen or large amounts of proteiTi~~ar6^^ 
desired, as in the development of a diagnostic kit. A 
preferred system for E. coli expression has been described 
(Smith, et al., "Single-step purification of polypeptides 
expressed in Escherichia coli as fusions with glutathione- 
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s-transferase, " Gene 1988; 67:31-40) whereby glutathione 
transferase is expressed with amino acids representing the 
cloned protein of interest attached to the carboxyl- 
terminus . The fusion protein can then be purified via 
affinity chromatography and the protein of interest fused 
to glutathione transferase released by digestion with the 
protease thrombin or alternatively the fusion protein is 
released intact from the affinity column by competing 
levels of free glutathione. 

To express the HCAVIII protein (SEQ ID NO: 4) of this 
invention in E. coli using the above described technology, 
an expression plasmid was produced fused to the 
glutathione transferase gene in frame with the HCAVIII 
gene (SEQ ID N0:1) to produce a fusion protein. The 
fusion gene/expression plasmid was assembled from nucleic 
acids derived from the following sources. First, the 
expression plasmid pGEX4Tl (Pharmacia, Piscataway, NJ) was 
cleaved in the polycloning region with the restriction 
endonucleases BamHI and EcoRI to permit insertion of the 
HCAVIII gene. Second, an oligonucleotide was synthesized, 
being 5 ' -GTCCACTTGGATCCGTTCACTGG-3 ' (SEQ ID NO:22) . Using 
the in vitro mutagenesis procedure described by Kunkel 
{Proc Natl Acad Sci USA 1985; 82:488-492) and the above 
oligonucleotide, a BamHI restriction site was created 
without altering the amino acid codons of the original 
protein. In addition the created BamHI site was situated 
in correct reading frame and proximity to the predicted 
cleavage site separating the signal peptide from the 
mature protein. The DNA sequences encoding the mature 
protein were released from the mutagenesis vector as a 
BamHI/EcoRI fragment, where the EcoRI site originates from 
a polycloning region of the DNA sequencing vector pUCl9 
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found downstream of the HCAVIII gene. The DMA fragments 
described above comprised of pGEX4T-l cleaved at BamHI and 
EcoRI and the HCAVIII gene released as a BamHI /EcoRI 
fragment was combined in a mixture composed of IX T4 
5 ligase buffer (50 mM Tris-HCl, 10 mM MgClj, 20 mM 

dithiothreitol, 1 mM ATP, 50 ijg/ml BSA, final pH 7.5) and 
T, DNA ligase (New England Biolabs, Beverly, MA) . The 
ligated DNA was used to transform a suitable strain of E. 
coli such as XL-1 Blue (Stratagene) . The recovered 

10 plasmid is sequenced to confirm the expected DNA sequence. 

Protein expression is induced in E. coli with the chemical 
isopropyl p-thiogalactoside, and the fusion protein is 
released by cell lysis, followed by denaturation and 
resolubilization of the fusion protein with 8 M urea/ 20 

15 mM Tris.Cl (pH 8.5)/10 mM dithiothreitol, dialysis and 

protein renaturation, and finally binding to an affinity 
column composed of glutathione-agarose (Sigma, St. Louis, 
MO) and cleavage with thrombin to release the HCAVIII 
protein. The resulting protein is suitable as an 

20 immunogen for polyclonal or monoclonal antibody production 

and for usage in an ELISA kit as a internal standard and 
positive control. Carbonic anhydrase enzyme activity (as 
described in Example 6) was measured for E.coli-derived 
HCAVIII and HCAVIII -truncated form (SEQ ID NO: 15) and 

25 compared to commercially obtained human carbonic anhydrase 

II (Sigma, St. Louis, Mo.). The activity, as reported in 
Enzyme Unit (U) /mg, for human carbonic anhydrase II was 
3571 U/mg, for HCAVIII was 274 U/mg and HCAVIII truncated 
form was 2632 U/mg. These results indicated an 

30 enzymatically active and renaturable HCAVIII derived from 

E.coli of comparable enzymatic activity to human carbonic 
anhydrase II was obtained. 
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The length of the resulting protein can be varied by 
altering the length of SEQ ID N0:1 prior to insertion into 
the expression plasmid, or by cleavage of amino acids from 
the protein resulting in the above example. Structure/ 
5 function studies of other HCA's suggest modifications (as 

defined by deletions at the N-terminal and C-terminal) 
more extensive than disclosed in SEQ ID NO: 12 would still 
permit the production and use of a protein as an immunogen 
or standard, these deletions being a protein defined by 

10 about amino acid residue 3 to amino acid residue 259 in 

SEQ ID NO: 12. Using existing technology one could 
synthesize a peptide of approximately 10 to 40 amino acids 
in length that comprises a structural domain of HCAVIII, 
This synthesized peptide, coupled to a carrier protein, 

15 could be used for generating polyclonal antisera specific 

for native HCAVIII. 

Exairple 4 
Production of Antibodies to HCAVIII 

The production of polyclonal antisera is described in 
20 great detail in Harlow, et al.. Antibodies: A Laboratory 

Manual, Cold Spring Harbor Laboratories, New York, 1988 
incorporated herein by reference. The HCAVIII protein 
(SEQ ID NO: 4) in the presence of an adjuvant is injected 
into rabbits with a series of booster shots as a 
25 prescribed schedule optimal for high titers of antibody in 

serum. A total of seven biweekly bleeds were obtained 
from two rabbits immunized with HCAVIII truncated protein 
(SEQ ID NO: 15} . The resulting anti-HCAVIII serum titer 
was compared to preimmune sera of the same rabbits and 
30 determined to be 1000 to 2000-fold greater, hence suitable 

as a reagent for indirect ELISA (Example 5) . Rabbit 
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antibody was partially purified by precipitation with 
ammonium sulfate (50%, final) followed by dialysis and 
fractionation by preparative DEAE-HPLC . 

An extensive description for producing monoclonal 
antibodies derived from the spleen B cells of an immunized 
mouse and a immortalized myeloma cell is found in the 
above reference for polyclonal antisera production. Mice 
are immunized with either the purified HCAVIII protein or 
a glutathione/HCAVIII fusion protein. Following cell 
fusion, selection for hybrid cells and subcloning, 
hybridomas are screened for a positive antibody against 
whole A549 cells or purified HCAVIII protein using an 
indirect ELISA assay as described for the ELISA kit (see 
Example 5) . 

Excunple 5 
ELISA Assay of Shed HCAVIII 

An indirect ELISA screening assay for HCAVIII protein 
(SEQ ID NO: 4) has been designed to detect and monitor the 
HCAVIII protein in body fluids including but not limited 
to serum and other biological fluids such as sputum or 
bronchial effluxion at effective levels necessary for 
sensitive but accurate determinations. It is intended to 
aid in the early diagnosis of non-small cell lung cancer, 
for which there currently is no effective treatment. An 
early-detection, accurate, non-invasive assay for non- 
small cell lung cancer would be of great benefit in the 
management of this disease. 

The immunochemicals used in this procedure were 
rabbit anti-human HCAVIII antibody (purified IgG, IgM) 
produced according to the procedure given in Example 4, 
mouse anti-human HCAVIII (monoclonal) also produced 
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according to the procedure given in Example 4, and goat 
anti-Rabbit IgG/peroxidase conjugate. The HCAVIII protein 
standard and internal positive control were produced as 
described in Example 3 for expression in E. coll. 
5 Substrate components include 1 M H^SO, stored at room 

temperature and 3 ' , 5 , 5 ' - tetramethylbenzidine (TMB) (Sigma 
Chemical Co.) used as a peroxidase substrate and stored at 
room temperature in the dark to prevent exposure to light. 
Several buffers, diluents, and blocking agents were 
10 used in the procedure. Note that no sodium azide 

preservative was used in any of the buffers. This was 
done to avoid any possible interference from the azide 
with the peroxidase conjugate. 

Phosphate buffered saline (PBS) was prepared by 
15 adding 32 . 0 g sodium chloride, 0.8 g potassium phosphate, 

monobasic, 0.8 g potassium chloride, and 4.6 g sodium 
phosphate, dibasic, anhydrous, to 3.2 L deionized water 
and mixing to dissolve. After bringing the solution to 4 
L with deionized water and mixing, the pH was about 7.2. 
20 The buffer can be stored at 4''C for a maximum of 3 weeks. 

Two bovine serum albumin solutions (BSA) were 
utilized as diluents. A 1% BSA solution in PBS, utilized 
as the second antibody/conjugate diluent, was prepared by 
adding 1 g BSA (bovine albumin, Fraction V, Sigma Chemical 
25 Co.) to 80 ml of PBS, allowing it to stand as it slowly 

goes into solution, adding PBS to a final volume of 100 
ml, and then mixing. This diluent can be stored at 4''C 
for a maximum of 2 weeks; however if the solution becomes 
turbid, it should be discarded. As a diluent for the 
30 standards and samples, a 0.025% BSA solution m PBS was 

prepared fresh for each assay by diluting the 1% BSA 
diluent with PBS 1:40 (vol/vol). 
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A borate blocking buffer (0.17 M H3BO3, 0.12 M NaCl, 
0.051 Tween 20, ImM EDTA and 0.25% BSA was also used. 

The substrate buffer was phosphate-ci trate/sodium per 
borate (Sigma, St. Louis, Mo.). 
5 All assays were performed in Immulon IV plates 

(Dynatech, Chantilly, VA #011-010-6301) . The assay plates 
were coated with a monoclonal antibody against HCAVIII by 
adding 50 ul of a 10 ug/ml solution of antibody in PBS to 
each well of Immulon IV plates. The plates were covered 

10 and incubated overnight at room temperature. The antibody 

solution was removed and the wells rinsed three times with 
deionized water. Three-hundred microliters (300 ul) of 
the borate blocking buffer was added to each well and 
incubated at room temperature for thirty minutes. The 

15 buffer was removed, the wells rinsed three times with 

deionized water, and the plates air dried. The plates 
were then wrapped and stored at 4°C. 

The standard E . coli-derived HCAVIII truncated protein 
(SEQ ID NO: 15), was diluted to 32 ng/ml in PBS/0.025% BSA 

20 and two-fold serial dilutions were made in same. The 

samples were also diluted in PBS/0.025% BSA and 50 ul of 
standard or sample was applied to each well. The plates 
were incubated overnight, covered, at room temperature. 

The standard and sample solutions were removed from 

25 the wells and the wells were rinsed three times with 

deionized water. Three-hundred microliters (300 ul) 
borate blocking buffer was added to each well and 
incubated at room temperature for thirty minutes. The 
plates were rinsed again with deionized water and tapped 

30 (inverted) on paper towels to remove excess water. The 

second antibody rabbit antisera to HCAVIII truncated 
protein (SEQ ID NO: 15), was diluted to 1 ug/ml in PBS/1% 
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BSA and 50 ul was added to each well. The plates were 
covered and incubated at room temperature two hours. 

The antibody solution was removed from the wells 
which were then rinsed with deionized water three times. 
They were then blocked for ten minutes at room temperature 
with borate blocking buffer, rinsed again with deionied 
water three times, and tapped on paper towels. The 
antibody conjugate, goat F(ab')2 x rabbit IgG & IgL-HPRO 
(Tago, Camarillo, CA. ) was diluted 1:16, 000 in PBS/1%BSA 
and 50 ul was added to each well. The plates were covered 
and incubated at room temperature two hours. 

The antibody conjugate solution was removed from the 
wells and they were rinsed with deionized water three 
times, blocked with three-hundred ul borate buffer at room 
temperature then minutes, rinsed three times with 
deionized water, and tapped on paper towels. The 
substrate was prepared no more than fifteen minues before 
use by dissolving one capsule of phosphate-citrate/ sodium 
perborate (Signma, St. Louis, Mo.) in 100 ml water. For 
each plate, one tablet of TMB was added to 10 ml of the 
phosphate-citrate/sodium perborate buffer and syringe 
filtered. One-hundred ul was added to each well and the 
plates were covered and incubated at room temperature in 
the dark for one hour. The reaction was stopped by adding 
50 ul of IM H2SO4 to each well. The plates were read on a 
Molecular Devices microplate reader at 450nm. Under these 
conditions, a linear response was obtained from 0.5 to 32 
ng/ml using HCAVIII truncated protein as a standard, with 
the assay sensitivity at 0.5 ng/ml. No cross-reaction was 
observed against HCAII, an abundant carbonic anhydrase in 
human serum. 
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Example 6 

Carbonic Anhydrase (CA) Activity of Biopsy Tissue 

Ice cold solutions of ITB (20 niM imidazole, 5 mM 
Tris, and 0.4 mM para-nitrophenol, pH 9.4-9.9) and Buffer 
5 A (25 inM triethanolamine, 59 itiM H2SO4, and 1 mM 

benzamidine HCl) are prepared. 

A homogenate is prepared by scraping with a cell 
scraper into 1-2 ml of Buffer A a monolayer of tissue 
cells cultured from a tissue sample taken from a biopsy. 

10 A portion of the sample is then boiled to inactivate CA. 

A tube is placed in an ice water bath. For the 
macroassay, a 10 x 75 mm glass tubes and rubber stopper 
with 16 gauge and 18 gauge needle ports is used; for the 
microassay, a 6 x 50 mm glass tubes and rubber stopper 

15 with 18 gauge needle port and 20 gauge needle with 

attached PE90 tubing. The sample is added and along with 
ice cold water to a final volume of 500 ul for macroassay 
or 50 pi for microassay. 500 ]il (macro) or 50 ul (micro) 
ice cold water is used for a water control. 10 pi 

20 antifoam (A. H. Thomas, Philadelphia, PA) is added to the 

tube which is then incubated in ice water for 0.5 to 3 
minutes , 

The tube is capped with a stopper and CO^ at 150 
ml/min (macro) or 100 ml/min (micro) is bubbled through 
25 the smaller needle port for 30 sec. 

50 ul (macro) or 50 ul (micro) of the ITB solution is 
rapidly added through the larger needle port with a cold 
Hamilton syringe. The sample becomes yellow. 

Using a timer or stopwatch, the time at which the 
30 solution in the tube becomes colorless is measured and 

recorded. The tube may be momentarily removed from the 
bath and held in front of a white background to determine 
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the color change. Comparison to a previously acidified 
sample may be used. 

The procedure is repeated with the boiled sample. 
The volume of sample that corresponds to approximately one 
5 enzyme unit is determined using the formula below. 

Volume (lEU) = = volume used x log2/log (boiled 
time/activated time) One enzyme unit is the activity that 
halves the boiled control time. 

The assay is repeated 1-3 times with the sample and 
boiled sample, using the adjusted volume of sample. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICWJT: 

(A) NAME: Cytoclonal Phannaceutics, Inc. 

(B) STREET: 9000 Harry Kines Blvd, Suite 330 

(C) CITY: Dallas 

(D) STATE: Texas 

(E) COUNTRY: USA 

(F) POSTAL CODE (ZIP): 75235 

(G) TELEPHONE: (214) 353-2923 

(H) TELEFAX: (214) 350-9514 

( I ) TELEX : 

(ii) TITLE OF INVENTION: Lung Cancer Mar)cer 
(lii) NUMBER OF SEQUENCES: 22 

(Iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: RICHARDS, WEDLOCK £ ANDREWS 

(B) STREET: 1201 Elm Street, Suite 4500 

(C) CITY: Dallas 

(D) STATE: TX 

(E) COUNTRY: US 

(F) ZIP: 75270-2197 

(v) COMPUTER READABLE FORM: 

(A) MEDIim TYPE: Floppy disJc 
(Bl COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release il.O, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: John A. Harre 

(B) REGISTRATION NUMBER: 37,345 

(C) REFERENCE/DOCKET NUMBER: B35792CIPPCT 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 214-939-4500 

(B) TELEFAX: 214-939-4600 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1104 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; cDNA 
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(ix) FEATURE: 

(A) MAME/KZY: CDS 

(B) LOCATION: 32.. 1093 

(IX) FEATURE: 

(A) NAME/KEY: mat_peptide 

(B) LOCATION: 119.. 1093 



(ix) FEATURE: 

(A) HAME/KEY: iniac_f eature 

(B) LOCATION: 1013.. 1024 

(D) OTHER INFORMATION: /note- "phosphorylation site 
recognized by protein kinase C and other Icina..." 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

GCCCGCGCCC GCCCCGCAGG AGCCCGCGAA G ATG CCC CGG CGC AGC CTG CAC 52 
Met Pro Arg Arg Ser Leu His 
-29 -25 



GCG GCG GCC GTG CTC CTG CTG GTG ATC TTA AAG GAA CAG CCT TCC AGC 100 
Ala Ala Ala Val Leu Leu Leu Val lie Leu Lys Glu Gin Pro Ser Ser 
-20 -15 -10 



CCG GCC CCA GTG AAC GGT TCC AAG TGG ACT TAT TTT GGT CCT GAT GGG 148 
Pro Ala Pro Val Asn Gly Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly 
-5 15 10 



GAG AAT AGC TGG TCC AAG AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG 196 
Glu Asn Ser Trp Ser Lys Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin 
15 20 25 

TCC CCC ATA GAC CTG CAC AGT GAC ATC CTC CAG TAT GAC GCC AGC CTC 244 
Ser Pro He Asp Leu His Ser Asp He Leu Gin Tyr Asp Ala Ser Leu 
30 35 40 

ACG CCC CTC GAG TTC CAA GGC TAC AAT CTG TCT GCC AAC AAG CAG TTT 2 92 

Thr Pro Leu Glu Phe Gin Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe 
45 50 55 



CTC CTG ACC AAC AAT GGC CAT TCA GTG AAG CTG AAC CTG CCC TCG GAC 340 
Leu Leu Thr Asn Asn Gly His Ser Val Lys Leu Asn Leu Pro Ser Asp 
60 €5 70 

ATG CAC ATC CAG GGC CTC CAG TCT CGC TAC AGT GCC ACG CAG CTG CAC 38 8 

Met His lie Gin Gly Leu Gin Ser Arg Tyr Ser Ala Thr Gin Leu His 
75 80 85 90 



CTG CAC TGG GGG AAC CCG AAT GAC CCG CAC GGC TCT GAG CAC ACC GTC 436 
Leu His Trp Gly Asn Pro Asn Asp Pro His Gly Ser Glu His Thr Val 
95 100 105 

AGC GGA CAG CAC TTC GCC GCC GAG CTG CAC ATT GTC CAT TAT AAC TCA 484 
Ser Gly Gin His Phe Ala Ala Glu Leu His He Val His Tyr Asn Ser 

110 115 120 
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GAC CTT TAT CCT GAC GCC AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC 532 
Asp Leu Tyr Pro Asp Ala Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu 
125 130 135 

GCT GTC CTG GCT GTT CTC ATT GAG ATG GGC TCC TTC AAT CCG TCC TAT 58 0 

Ala Val Leu Ala Val Leu lie Glu Met Gly Ser Phe Asn Pro Ser Tyr 
140 145 150 

GAC AAG ATC TTC AGT CAC CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA 62 8 

Asp Lys lie Phe Ser Hia Leu Gin His Val Lys Tyr Lys Gly Gin Glu 
155 160 165 170 

GCA TTC GTC CCG GGA TTC AAC ATT GAA GAG CTG CTT CCG GAG AGG ACC 67 6 

Ala Phe Val Pro Gly Phe Asn lie Glu Glu Leu Leu Pro Glu Arg Thr 
175 180 185 

CCT GAA TAT TAC CGC TAC CGG GGG TCC CTG ACC ACA CCG CCT TGC AAC 724 
Ala Glu Tyr Tyr Arg Tyr Arg Gly Ser Leu Thr Thr Pro Pro Cys Aan 
190 195 200 

CCC ACT GTG CTC TGG ACA GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG 772 
Pro Thr Val Leu Trp Thr Val Phe Arg Asn Pro Val Gin lie Ser Gin 
205 210 215 

GAG CAG CTG CTG GCT TTG GAG ACA GCC CTG TAC TGC ACA CAC ATG GAC 82 0 

Glu Gin Leu Leu Ala Leu Glu Thr Ala Leu Tyr Cys Thr His Met Asp 
220 225 230 

GAC CCT TCC CCC AGA GAA ATG ATC AAC AAC TTC CGG CAG GTC CAG AAG 868 
Asp Pro Ser Pro Arg Glu Met lie Asn Asn Phe Arg Gin Val Gin Lys 
235 240 245 250 

TTC GAT GAG AGG CTG GTA TAC ACC TCC TTC TCC CAA GTG CAA GTC TGT 916 
Phe Asp Glu Arg Leu Val Tyr Thr Ser Phe Ser Gin Val Gin Val Cys 
255 260 265 

ACT GCG GCA GGA CTG AGT CTG GGC ATC ATC CTC TCA CTG GCC CTG GCT 964 
Thr Ala Ala Gly Leu Ser Leu Gly He He Leu Ser Leu Ala Leu Ala 
270 275 280 

GGC ATT CTT GGC ATC TGT ATT GTG GTG GTG GTG TCC ATT TGG CTT TTC 1012 
Gly He Leu Gly He Cys He Val Val Val Val Ser He Trp Leu Phe 
285 290 295 

AGA AGG AAG AGT ATC AAA AAA GGT GAT AAC AAG GGA GTC ATT TAC AAG 1060 
Arg Arg Lys Ser He Lys Lys Gly Asp Asn Lys Gly Val He Tyr Lys 
300 305 310 

CCA GCC ACC AAG ATG GAG ACT GAG GCC CAC GCT TGAGGTCCCC G 1104 
Pro Ala Thr Lys Met Glu Thr Glu Ala His Ala 

315 320 325 



(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 354 amino acids 

(B) TYPE: amino acid 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Pro Arg Arg Ser Leu His Ala Ala Ala Val Leu Leu Leu Val He 

-29 -25 -20 -15 

Leu Lys Glu Gin Pro Ser ser Pro Ala Pro Val Aan Gly Ser Lys Trp 
-10 -5 1 

Thr Tyr Phe Gly Pro Aap Gly Glu Aan Ser Trp Ser Lys Lys Tyr Pro 
5 10 15 

Ser Cya Gly Gly Leu Leu Gin Ser Pro He Aap Leu His Ser Aap He 
20 25 30 35 

Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin Gly Tyr Aan 
40 45 50 

Leu Ser Ala Aan Lya Gin Phe Leu Leu Thr Aan Aan Gly Hla ser val 

55 60 65 

Lys Leu Aan Leu Pro Ser Aap Met His He Gin Gly Leu Gin Ser Arg 

70 75 80 

Tyr Ser Ala Thr Gin Leu Hia Leu Hia Trp Gly Aan Pro Aan Aap Pro 
85 90 95 

Hia Gly Ser Glu Hia Thr Val Ser Gly Gin His Phe Ala Ala Glu Leu 
100 105 110 115 

His He Val His Tyr Aan Ser Asp Leu Tyr Pro Asp Ala Ser Thr Ala 
120 125 130 

Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu He Glu Met 
135 140 145 

Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His Leu Gin His 

150 155 160 

Val Lys Tyr Lya Gly Gin Glu Ala Phe Val Pro Gly Phe Aan He Glu 
165 170 175 

Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr Arg Gly Ser 
180 185 190 195 

Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr Val Phe Arg 
200 205 210 

Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu Glu Thr Ala 
215 220 225 

Leu Tyr Cys Thr His Met Asp Aap Pro Ser Pro Arg Glu Met He Aan 
230 235 240 
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Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val Tyr Thr S«r 
245 250 255 

Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser Leu Gly lie 
260 265 2"70 275 

lie Leu Ser Leu Ala Leu Ala Gly lie Leu Gly He Cya He Val Val 
260 285 290 

Val Val Ser He Trp Leu Phe Arg Arg Lys Ser He Lys Lys Gly Aap 
295 300 305 

Aan Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu Thr Glu Ala 
310 3X5 320 

His Ala 

325 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 986 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) T0P01/5GY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATXmE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1,.975 

(ix) FEATURE: 

(A) NAME/KEY: inisc_f eatur* 

(B) LOCATION: 895.. 906 

(D) OTHER INFORMATION: /note- "phosphorylation site 
recognized by protein C kinase and other kina..." 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TCC AAG TGG ACT TAT TTT GGT CCT GAT GGG GAG AAT AGC TGG TCC AAG 4B 
Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 
15 10 15 

AAG TAG CCG TCG TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC 96 
Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro He Asp Leu His 
20 25 30 

AGT GAC ATC CTC CAG TAT GAC GCC AGC CTC ACC CCC CTC GAG TTC CAA 144 
Ser Asp He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 



GGC TAC AAT CTG TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGC 
Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
50 55 60 



192 
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CAT TCA GTG AAG CTG AAC CTG CCC TCG GAC ATG CAC ATC CAG GGC CTC 24 0 

His Ser Val Lys Leu Asn Leu Pro S r Asp Met His lie Gin Gly Leu 
65 TO 75 80 

CAG TCT CGC TAC AGT GCC ACG CAG CTG CAC CTG CAC TGG GGG AAC CCG 28 8 

Gin ser Arg Tyr Ser Al« Thr Gin Leu His Leu His Trp Gly Aan Pro 
85 90 95 

AAT GAC CCG CAC GGC TCT GAG CAC ACC GTC AGC GGA CAG CAC TTC GCC 336 
Aan Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 

100 105 110 

GCC GAG CTG CAC ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT GAC GCC 38 4 

Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
115 120 125 

AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC GCT GTC CTG GOT GTT CTC 432 
Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 140 

ATT GAG ATG GGC TCC TTC AAT CCG TCC TAT GAC AAG ATC TTC AGT CAC 480 
He Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 
145 150 155 160 

CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC 528 
Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 
165 170 175 

AAC ATT GAA GAG CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CGC TAC 576 
Asn He Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 
180 185 1»C 

CGG GGG TCC CTG ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA 624 
Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG 672 
Val Phe Arg Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 
210 215 220 

GAG ACA GCC CTG TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GAA 720 
Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

ATG ATC AAC AAC TTC CGG CAG GTC CAG AAG TTC GAT GAG AGG CTG GTA 7 68 

Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 
245 250 255 

TAC ACC TCC TTC TCC CAA GTG CAA GTC TGT ACT GCG GCA GGA CTG AGT 816 

Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser _ 

260 265 270 

CTG GGC ATC ATC CTC TCA CTG GCC CTG GCT GGC ATT CTT GGC ATC TGT 864 
Leu Gly He He Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cys 
275 2B0 285 
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ATT GTG GTG GTG GTG TCC ATT TGG CTT TTC AGA AGG AAG AGT ATC AAA 
lie V«l Val Val V«l Ser lie Trp Leu Phe Arg Arg Lys Ser He Lys 
290 295 300 

AAA GGT GAT AAC AAG GGA GTC ATT TAC AAG CCA GCC ACC AAG ATG GAG 
Lys Gly Aap Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu 
305 310 315 320 

ACT GAG GCC CAC GCT TGAGGTCCCC G 
Thr Glu Ala His Ala 
325 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 325 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 
15 10 15 

Lys Tyr Pro Ser Cya Gly Gly Leu Leu Gin Ser Pro He Asp Leu His 
20 25 30 

Ser Asp He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 

50 55 60 

His Ser Val Lys Leu Asn Leu Pro Ser Asp Met His He Gin Gly Leu 
65 70 75 80 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 
85 90 95 

Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 
100 105 110 

Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
115 120 125 

Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 140 



Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 

165 170 175 



wo 96A)2552 



PCTaJS95«9145 



47 

Asn He GIu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 
IBO 185 190 

Arg Gly ser Leu Thr Thr Pro Pro cys Asn Pro Thr v«l Leu Trp Thr 
195 200 205 

Val Phe Arg Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 
210 2lb 220 

Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 
245 250 255 

Tyr Thr Ser Phe Ser Gin Val Gin Val cya Thr Ala Ala Gly Leu S«r 

260 265 2-70 

Leu Gly He He Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cys 
275 280 285 

He Val Val Val Val Ser He Trp Leu Phe Arg Arg Lys Ser He Lys 
290 295 300 

Lys Gly Asp Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu 

305 310 315 320 

Thr Slu Ala His Ala 

325 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2134 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNE5S : both 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 116.. 1177 

(ix) FEATURE: 

(A) NAME/KEY: inat_peptide 

(B) LOCATION: 203.. 1177 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GTACTCGCCA CGGCACCCAG GCTGCGCGCA CGCGGTCCCG GTGTGCAGCT GGAGAGCGAG 60 

CGGCCACCGG GAGCCCCCGG CACAGCCCGC GCCCGCCCCG CAGGAGCCCG CGAAG ATG 118 

Met 
-29 



CCC CGG CGC AGO CTG CAC GCG GCG GCC GTG CTC CTG CTG GTG ATC TTA 



166 
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Pro Arg Arg Ser Leu His Ala Ala Ala Val Leu Leu Leu Val lie Leu 

-25 -20 -15 

AAG GAA CAG CCT TCC AGC CCG GCC CCA GTG AAC GGT TCC AAG TGG ACT 214 
Lys Glu Gin Pro Ser Ser Pro Ala Pro Val Asn Gly Ser Lys Trp Thr 
-10 -5 1 

TAT TTT GGT CCT GAT GGG GAG AAT AGC TGG TCC AAG AAG TAG CCG TCG 262 
Tyr Phe Gly Pro Asp Gly Glu Aan Ser Tip Ser Lys Lys Tyr Pro Ser 
5 10 15 20 

TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC ACT GAC ATC CTC 310 
Cys Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His Ser Asp lie Leu 
25 30 35 

CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAA GGC TAC AAT CTG 358 
Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin Gly Tyr A«n L«u 
40 45 50 

TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGC CAT TCA GTG AAG 406 
Ser Ala Asn Lys Gin Phe Leu I^u Thr Asn Asn Gly His Ser Val Lys 

55 60 65 

CTG AAC CTG CCC TCG GAC ATG CAC ATC CAG GGC CTC CAG TCT GGC TAC 454 
Leu Asn Leu Pro Ser Asp Met His lie Gin Gly I>eu Gin Ser Arg Tyr 
70 75 BO 

ACT GCC ACG CAG CTG CAC CTG CAC TGG GGG AAC CCG AAT GAC CCG CAC 502 
Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro Asn Asp Pro His 
85 90 95 100 

GGC TCT GAG CAC ACC GTC AGC GGA CAG CAC TTC GCC GCC GAG CTG CAC 550 
Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala Ala Glu Leu His 
105 110 115 

ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT GAC GCC AGC ACT GCC ACC 598 
He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala Ser Thr Ala Ser 
120 125 130 

AAC AAG TCA GAA GGC CTC GCT GTC CTG CCT GTT CTC ATT GAG ATG CCC 646 
Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu He Glu Met Gly 
135 140 145 

TCC TTC AAT CCG TCC TAT C3AC AAG ATC TTC AGT CAC CTT CAA CAT GTA 694 
Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His Leu Gin His Val 
150 155 160 

AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC AAC ATT GAA GAG 742 
Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe Asn He Glu Glu 
165 170 175 180 

CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CGC TAC CGG GGG TCC CTG 790 
Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr Arg Gly Ser Leu 
185 190 195 

ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA GTT TTC CGA AAC 838 
Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr Val Phe Arg Aan 
200 205 210 
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CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG GAG ACA GCC CTG 88 6 

Pro Val Gin lie Ser Gin Glu Gin Leu Leu Ala Leu Glu Thr Ala Leu 
215 220 225 

TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GAA ATG ATC AAC AAC 93 4 

Tyr Cys Thr Has Met Asp Aap Pro Ser Pro Arg Glu Met lie Aan Aan 
230 235 240 

TTC CGG CAG GTC CAG AAG TTC GAT GAG AGG CTG GTA TAC ACC TCC TTC 982 
Phe Arg Gin V«l Gin Lye Phe A*p Glu Arg Leu Val Tyr Thr Ser Phe 
245 250 255 260 

TCC CAA GTG CAA GTC TGT ACT GCG GCA GGA CTG AGT CTG GGC ATC ATC 1030 
Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser Leu Gly lie lie 
265 270 275 

CTC TCA CTG GCC CTG GCT GGC ATT CTT GGC ATC TGT ATT GTG GTG GTG 107 8 

Leu Ser Leu Ala Leu Ala Gly He Leu Gly lie Cys He Val Val Val 
280 285 290 

GTG TCC ATT TGG CTT TTC AGA AGG AAG AGT ATC AAA AAA GGT GAT AAC 1126 
Val ser He Trp Leu Phe Arg Arg Lys Ser He Lys Lys Gly Asp Asn 

295 300 305 

AAG GGA GTC ATT TAC AAG CCA GCC ACC AAG ATG GAG ACT GAG GCC CAC 1174 
Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu Thr Glu Ala His 
310 315 320 

GCT TGAGGTCCCC GGAGCTCCCG GGCACATCCA GGAAGGACCT TGCTTTGGAC 1227 

Ala 

325 

CCTACACACT TCGGCTCTCT GGACACTTGC GACACCTCAA GGTGTTCTCT GTAGCTCAAT 1287 

CTGCAAACAT GCCAGGCCTC AGGGATCCTC TGCTGGGTGC CTCCTTGCCT TGGGACCATG 1347 

GCCACCCCAG AGCCATCCGA TCGATGGATG GGATGCACTC TCAGACCAAG CAGCAGGAAT 1407 

TCAAAGCTGC TTGCTGTAAC TGTGTGAGAT TGTGAAGTGG TCTGAATTCT GGAATCACAA 1467 

ACCAAGCCAT GCTGGTGGGC CATTAATGGT TGGAAAACAC TTTCATCCGG GGCTTTGCCA 1527 

GAGCGTGCTT TCAAGTGTCC TGGAAATTCT GCTGCTTCTC CAAGCTTTCA GACAAGAATG 1587 

TGCACTCTCT GCTTAGGTTT TGCTTGGGAA ACTCAACTTC TTTCCTCTGG AGACGGGGCA 1647 

TCTCCCTCTG ATTTCCTTCT GCTATGACAA AACCTTTAAT CTGCACCTTA CAACTCGGGG 1707 

ACAAATGGGG ACAGGAAGGA TCAAGTTGTA GAGAGAAAAA GAAAACAAGA GATATACATT 17 67 

GTGATATATT AGGGACACTT TCACAGTCCT GTCCTCTGGA TCACAGACAC TGCACAGACC 1827 

TTAGGGAATG GCAGGTTCAA GTTCCACTTC TTGGTGGGGA TGAGAAGGGA GAGAGAGCTA 18 87 

GAGGGACAAA GAGAATGAGA AGACATGGAT GATCTGGGAG AGTCTCACTT TGGAATCAGA 1947 

ATTGGAATCA CATTCTGTTT ATCAAGCCAT AATGTAAGGA CAGAATAATA CAATATTAAG 2007 
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TCCAAATCCA ACCTCCTGTC AGTGGAGCAG TTATGTTTTA TACTCTACAG ATTTTACAAA 2067 
TAATGAGGCT GTTCCTTGAA AATGTGTTGT TGCTGTGTCC TGGAGGAGAC ATGAGTTCCG 2127 
AGATGAC 2134 

(2) INFORMATION FOR SEQ ID N0:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 354 amino acids 

(B) TYPE: aoino acid 
(Dl TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Pro Arg Arg Ser Leu His Ala Ala Ala Val Leu Leu Leu Val lie 

-29 -25 -20 -15 

Leu Lys Glu Gin Pro Ser Ser Pro Ala Pro Val Asn Gly Ser Lys Trp 

-10 -5 1 

Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys Lys Tyr Pro 
5 10 15 

Ser Cys Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His Ser Asp lie 
20 25 30 35 

Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin Gly Tyr Asn 
40 45 50 

Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly His Ser Val 
55 60 65 

Lys Leu Asn Leu Pro Ser Asp Met His He Gin Gly I<eu Gin Ser Arg 
70 75 80 

Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro Asn Asp Pro 

85 90 95 



His lie Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala Ser Thr Ala 
120 125 130 

Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu He Glu Met 
135 140 145 

Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His Leu Gin His 
150 155 160 



Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe Asn He Glu 
165 170 175 
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Glu Leu Leu Pro Glu Axg Thr Ala Glu Tyr Tyr Arg Tyr Arg Gly Ser 
180 185 190 195 



Asn Pro Val Gin lie Ser Gin Glu Gin Leu Leu Ala Leu Glu Thr Ala 
215 220 225 

Leu Tyr Cys Thr His Met Aap Aap Pro Ser Pro Arg Glu Met lie Aan 
230 235 240 

Asn Phe Arg Gin Val Gin Lya Phe Asp Glu Arg Leu Val Tyr Thr Ser 
245 250 255 

Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser Leu Gly II* 

260 265 270 275 

He Leu Ser Leu Ala Leu Ala Gly lie Leu Sly He Cys He Val Val 
280 285 290 

Val Val Ser He Trp Leu Phe Arg Arg Lys Ser He Lys Lys Gly Aap 

295 300 305 

Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu Thr Glu Ala 

310 315 320 

His Ala 

325 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 624 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(ii) MOLECin^ TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

CCAATCTGCC TTTGAATCTG GAGGAAATAG GCAGAAACAA AATGACTGTA GAACTTATTC 60 

TCTGTAGGCC AAATTTCATT TCAGCCACTT CTGCAGGATC CCTACTGCCA ACCTGGAATG 120 

GAGACTTTTA TCTACTTCTC TCTCTCTGAA GATGTCAAAT CGTGGTTTAG ATCAAATATA 18 0 

TTTCAAGCTA TAAAAGCAGG AGGTTATCTG TGCAGGGGGC TGGCATCATG TATTTAGGGG 240 

CAAGTAATAA TGGAATGCTA CTAAGATACT CCATATTCTT CCCCGAATCA CACAGACAGT 300 

TTCTGACAGG CGCAACTCCT CCATTTTCCT CCCGCAGGTG AGAACCCTGT GGAGATGAGT 360 

CAGTGCCATG ACTGACyVAGG AACCGACCCC TAGTTGAGAG CACCTTGCAG TTCCCCGAGA 420 

ACTTTCTGAT TCACAGTCTC ATTTTGACAG CATGAAATGT CCTCTTGAAG CATAGCTTTT 480 
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TAAATATCTT TTTCCTTCTA CTCCTCCCTC TGACTCTAAG AATTCTCTCT TCTGGAATCG 



540 



CTTGAACCCA GGAGGCGGAG GTTGCAGTAA GCCAAGGTCA TGCCACTGCA CTCTAGCCTG 



600 



GGTGACAGAG CGAGACTCCA TCTC 



624 



(2) INFORMATION FOR SEQ ID NO:B: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
<iv) ANTI -SENSE: NO 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: i..l2 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:B: 

AOA AGG AAG AGT 12 
Axg Arg Lys Ser 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 4 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 

Arg Arg Lys Ser 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

(iii) HYPOTHETICAL: NO 



<iv) ANTI-SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEO ID NO: 10: 
TGAGTCGACG 



(2) INrORMATION TOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucl«ic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOSV: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: MO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AATTCGTCGA CTCA 14 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 813 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..813 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TCC AAG TGG ACT TAT TTT GGT CCT GAT GGG GAG AAT AGC TGG TCC AAG 4 8 

Ser Lys Trp Thr Tyr Phe Gly Pro Aap Gly Glu Aan Ser Trp S«r Lys 
15 10 15 

AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC 96 
Lys Tyr Pro Ser Cy» Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His 
20 25 30 

AGT GAC ATC CTC CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAA 144 
Sar Asp 11* hmu Gin Tyi Asp Ala Ser Lau Thr Pro Leu Glu Phe Gin 
35 40 45 



GGC TAC AAT CTG TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGC 192 
Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
50 55 60 



wo 96/02552 



PCTAJS95A>9145 



CAT TCA GTG AAG CTG AAC CTG CCC TCG GAC ATG CAC ATC CAG GGC CTC 
His Ser Val Lys Leu Asn Leu Pro Ser Asp Met His lie Gin Gly Leu 
65 70 75 80 

CAG TOT CGC TAC AGT GCC ACG CAG CTG CAC CTG CAC TGG GGG AAC CCG 
Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Aan Pro 
85 90 95 

AAT GAC CCG CAC GGC TCT GAG CAC ACC GTC AGC GGA CAG CAC TTC GCC 
Aan Asp Pro His Gly Ser Glu His Thr Val ser Gly Gin His Phe Ala 
100 105 110 

GCC GAG CTG CAC ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT GAC GCC 
Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
115 120 125 

AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC GCT GTC CTG GCT GTT CTC 
Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 140 

ATT GAG ATG GGC TCC TTC AAT CCG TCC TAT GAC AAG ATC TTC AGT CAC 
He Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 
145 150 155 160 

CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC 
Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 
165 170 175 

AAC ATT GAA GAC CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CCC TAC 
Asn He Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 
180 185 190 

CGG GGG TCC CTG ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA 
Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG 
Val Phe Arg Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 
210 215 220 

GAG ACA GCC CTG TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GRA 
Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

ATG ATC AAC AAC TTC CGG CAG GTC CAG AAG TTC GAT GAG AGG CTG GTA 
Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 
245 250 255 

TAC ACC TCC TTC TCC CAA GTG CAA GTC TGT ACT GCG GCA GGA CTG 
Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu 

260 265 270 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 271 amino acids 

(B) TYPE: amino acid 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xa) SEQl^NCE DESCRIPTION: SEQ ID NO: 13: 

Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Aan Ser Trp Ser Lys 
15 10 15 

Lys Tyr Pro Ser Cya Gly Gly Leu Leu Gin Ser Pro He Asp Leu His 
20 25 30 

Ser Asp He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 

Gly Tyr Aan Leu Ser Ala Aan Lys Gin Phe Leu Leu Thr Aan Aan Gly 
50 55 60 

His Ser Val Lys Leu Asn Leu Pro Ser Asp Met His lie Gin Gly Leu 
65 70 75 80 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Aan Pro 
B5 90 95 

Asn Asp Pro His Gly Ser Glu Hia Thr Val Ser Gly Gin His Phe Ala 

100 105 110 

Ala Glu Leu His He Val His Tyr Asn Ser Asp I*u Tyr Pro Asp Ala 
115 120 125 

Ser Thr Ala Ser Aan Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 140 

He Glu Met Gly Ser Phe Asn Pro Ser Tyr Aap Lys He Phe Ser His 
145 150 155 160 

Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 
165 170 175 

Aan He Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 
180 185 190 

Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

Val Phe Arg Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 

210 215 220 

Glu Thr Ala l/cu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 
245 250 255 

Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu 

260 265 270 
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(2) INFORhDVTION FOR SEQ ID NO:14: 



(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 822 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A> NAME/KEY: CDS 
(B) LOCATION: 1..B22 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



TCC AAG TGG ACT TAT TTT GGT CCT GAT GGG GAG AAT AGC TGG TCC AAG 

Ser Lys Trp Thr Tyr Phe Gly Pro Aap Gly Glu Aan Ser Trp Ser Ly« 

15 10 15 

AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC 

Lys Tyr Pro Ser Cy* Gly Gly Leu Leu Gin Ser Pro lie Aap Leu His 

20 25 30 

AGT GAC ATC CTC CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAA 

Ser Asp lie Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 

35 40 45 

GGC TAC AAT CTG TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGC 

Gly Tyr Aan Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 

50 55 60 

CAT TCA GTG AAG CTG AAC CTG CCC TCG GAC ATG CAC ATC CAG GGC CTC 

His Ser Val Lys Leu Asn Leu Pro Ser Asp Met His lie Gin Gly Leu 

€5 70 75 80 

CAG TCT CGC TAC AGT GCC ACG CAG CTG CAC CTG CAC TGG GGG AAC CCG 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 



AAT GAC CCG CAC GGC TCT GAG CAC ACC GTC AGC GGA CAG CAC TTC GCC 336 

Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 
100 105 110 

GCC GAG CTG CAC ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT GAC GCC 384 

Ala Glu Leu His lie Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
115 120 125 



AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC GCT GTC CTG GCT GTT CTC 432 

Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 

130 135 140 



ATT GAG ATG GGC TCC TTC AAT CCG TCC TAT GJVC AAG ATC TTC AGT CAC 48 0 

lie Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys lie Phe Ser His 
145 150 155 160 
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CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC 
Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe V«l Pro Gly Phe 
165 170 175 

AAC ATT GAA GAG CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CGC TAC 
Aan Il« Glu Glu Leu Leu Pro Glu Axg Thr Ala Glu Tyr Tyr Arg Tyr 
180 185 190 

CGG GGG TCC CTG ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA 
Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG 
v«l Phe Arg Asn Pro Val Gin lie Ser Gin Glu Gin Leu Leu Ale Leu 
210 215 220 

GAG ACA GCC CTG TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GAA 
Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

ATG ATC AAC AAC TTC CGG CAG GTC CAG AAG TTC GAT GAG AGG CTG GTA 
Met lie Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 
245 250 255 

TAC ACC TCC TTC TCC CAA GTG CAA GTC TGT ACT GCG GCA GGA CTG AGT 
Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 
260 265 270 

CTG GGC 
Leu Gly 



(2) INFORMATION FOR SEQ ID NO:15: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 274 amino acids 

(B) type: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECXJLE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 
15 10 15 

Lys Tyr Pro ser Cys Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His 
20 25 30 

Ser Asp He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
50 55 60 

His Ser Val Lys Leu Asn I^eu Pro Ser Asp Met His He Gin Gly Leu 

65 70 75 80 
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Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu Hia Trp Gly Aan Pro 
85 90 95 

Asn Asp Pro His Gly Ser Glu His Thr Val S r Gly Gin His Phe Ala 
100 105 110 



Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 

130 135 140 

lie Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 
145 150 155 160 

Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 
165 170 175 



Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 

195 200 205 

Val Phe Arg Asn Pro Val Gin He ser Gin Glu Gin Leu Leu Ala Leu 

210 215 220 



Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 
245 250 255 

Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 
260 265 270 



(2) INFORMATIW FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 
(C} STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CTTTTTTGAT ACCCTTCCTT CTGAA 
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(2) INFORMATION FOR SEC3 ID NO: 17: 



(i) SEQUENCE CHARACTERISTICS: 

lA) LENGTH: 986 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 



(ii) MOLECULf TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..975 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 



TCC AAG TGG ACT TAT TTT GGT OCT GAT GGG GAG AAT AGC TGG TOO AAG 
Ser Lys Trp Thr Tyr Phe Gly Pro Aap Gly Glu Aan Ser Trp Ser Ly« 
15 10 15 

AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC 
Ly5 Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro He Asp I<eu His 
20 25 30 

AGT GAC ATC CTC CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAA 
Ser Asp He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 



GGC TAC AAT CTG TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGC 192 
Gly Tyr Asn Leu Ser Ala Aan Lys Gin Phe Leu Leu Thr Asn Asn Gly 
50 55 60 



CAT TCA GTG AAG CTG AAC CTG CCC TCG GAC ATG CAC ATC CAG GGC CTC 

His Ser Val Lys Leu Asn Leu Pro Ser Asp Met His He Gin Gly Leu 

65 70 75 80 

CAG TCT CGC TAC AGT GCC ACG CAG CTG CAC CTG CAC TGG GGG AAC CCG 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 

B5 90 95 

AAT GAC CCG CAC GGC TCT GAG CAC ACC GTC AGC GGA CAG CAC TTC GCC 

Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Sin His Phe Ala 

100 105 110 

GCC GAG CTG CAC ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT GAC GCC 

Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 

115 120 125 



AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC GCT GTC CTG GCT GTT CTC 432 
ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 

130 135 140 



ATT GAG ATG GGC TCC TTC AAT CCG TCC TAT GAC AAG ATC TTC AGT CAC 480 
He Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 
145 150 155 160 
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CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC 
Leu Gin His Vttl Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 
165 170 175 

AAC ATT GAA GAG CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CGC TAC 
Asn lie Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 
IBO 165 190 

CGG GGG TCC CTG ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA 
Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG 
Val Phe Arg hsn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 
210 215 220 

GAG ACA GCC CTG TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GAA 
Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

ATG ATC AAC AAC TTC CGG CAG GTC CAG AAG TTC GAT GAG AGG CTG OTA 
Met He Aan Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 
245 250 255 

TAC ACC TCC TTC TCC CAA GTG CAA GTC TGT ACT GCG GCA GGA CTG AGT 
Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 
260 265 270 

CTG GGC ATC ATC CTC TCA CTG GCC CTG GCT GGC ATT CTT GGC ATC TGT 
Leu Gly He He Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cya 
275 280 280 

ATT GTG GTG GTG GTG TCC ATT TGG CTT TTC AGA AGG AAG GGT ATC AAA 
He Val Val Val Val Ser He Trp Leu Phe Arg Arg Lys Gly He Lys 
290 295 300 

AAA GGT GAT AAC AAG GGA GTC ATT TAC AAG CCA GCC ACC AAG ATG GAG 
Lys Gly Asp Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu 
305 310 315 320 

ACT GAG GCC CAC GCT TGAGGTCCCC G 
Thr Glu Ala His Ala 
325 



(21 INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 325 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEOUENCE DESCRIPTION: SEQ ID NO:18: 

Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 

15 10 15 
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Lys Tyt Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His 
20 25 30 

Ser Aap He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr A*n Asn Gly 
50 55 €0 

His Ser Val Lys Leu Asn Leu Pro ser Asp Met His He Gin Gly Leu 
65 "70 75 80 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 
85 90 95 

Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 

100 105 110 

Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
115 120 125 

Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 140 

He Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 
145 150 155 160 

Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 
165 170 115 

Asn He Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 
leo 185 190 

Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

Val Phe Arg Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 
210 215 220 

Glu Thr Ala Leu Tyr Cys Thr His Met Aap Asp Pro Ser Pro Arg Glu 

225 230 235 240 

Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 
245 250 255 

Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 
260 265 270 

Leu Gly He He Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cys 

275 _ 280 285 

He Val Val Val Val Ser He Trp Leu Phe Arg Arg Lys Gly He Lys 
290 295 300 

Lys Gly Asp Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu 

305 310 315 320 
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Thr Glu Ala His Ala 
325 



(2) INrORMRTION FOR SEQ ID NO : 1 9 : 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH; 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 
CD) TOPOUJGY: linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ACATTGAAGA GCTGCTTCCG G 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinQle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQireiNCE DESCRIPTION: SEQ ID NO: 20: 
AATTTGCACG GGGTTTCGG 



(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1363 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: do\lble 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

CTGACACCAC TCAGACCGTG TGTGATCTGC CTCAACCA(?r TCTGCGATCC CACCCAGCSAA 60 

CAGAAGACTG CAAGAAAACG TTACTTCAAC CCCCCTGTGA TCCCATCTGC AACCTGACCA 120 

ATCAGCACTC CCCAAGTCCC AAGCCCCTAT CTGCCAAATT ATCTTTAAAA ACTCCCCAGA 180 

GGCAGGGTGC AGTGGTTCAA CGCCTGTAAT CCCAGCACTT TAGGTGGATC ACGAGATCAA 240 

GAGATCAAGA CCAGCCTGGC CAACATGGTG AAACCCCGTC TTCTTACTAA AAATACAAAA 300 

ATTAGCTGGG TGTGGCGGCG CGTGCCTCrTA ATCCCAGCTA CCCAGGAGGC TGAGGCAGGA 360 
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GAATCGCTTG AACCCGTGRG GCAGAGGTTG CAGTGAGCCA AGA.CCATGCC ACTGCATTTC 420 

AGCCTGGGCG ACAGAGGGGA ACTCCGTCTG AACAAACAAA CAAACAAACA ACTCCCGGAR 480 
TGCTTGGGGA GACTGATTTG AGTACTGGAA TCCCAGTACT TTAGGAGGCC AAGGTAGGTG 54 0 

GATCATTTGA GGTCAGGAGT TCCAGACCAG CCTGGCCAAC ATCGTGAAAC CCCGTCTCTA 600 

CTAAAATTAG AAAAATTAGC CGGGTGTGGT GGTGGGCGCC TGTAATCCCA GCACTTTGGG 660 

AAGCCAAGGC AGGTGAATTA TCTGAGGTCG GGAGTTTAAG GCCAGCCTTA AACTGGCGAA 720 

ACCCCGCCTC TACTAAAAAT ACAAAAATTA TCTGGGCATG GTGGCATGTG CCTGTAATCC 780 

CAGCTACTCG GGAGGCTGAG GCAGGAGAAT CGCTTGAACC CGGGAGGCGG AGGTTGCAGT 640 

GAGCCGAGAT CACGCTATTG CACTCCGGCC TGGGCAACAG AGCGAGACTC CGTCTCAAAC 900 

AAACAAACAA AGGAACGAAA ACTCCGGTCT CCGGCACCGC AAGCTCTGCG TGAATTACTT 960 

TCTCCATTGC AACTCCCCTG TCTTGATAAA TGGGCTCTGT CTAAGCAGCG GGCAAGGTGA 1020 

ACTCGTTGGG CTGTTACAGG ACCAGTGACA GACCAAGGCA TGCCACTGAA GGAATCCCTA 108 0 

GACGCACCCT TCTGGATGTG AGGCAGGCGG ATCTCACCCC ACGCCTGCCA GCAGCTCCTC 1140 

GGAGAACTGT GTTCCTGGGT CAGCCCTGGC CCAGAGGAGC GCCGGGGACC CGCAGAGTGC 1200 

TGCTGAAGTC AAGGCTACAA CTCACCTAGG ATCTGGGGCG CCAGCCTCCG GTGGGCAGGG 12 60 

CGTTCTCCTC CCCCACCCCC TCCCCGCACG ATGACATCAA GTGTTTGGCG TTGAGTTGCT 1320 

CCATAAAAGC TGCCCGGGGA ABCCAGGAGA GCGAAGGGCG GAC 1363 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQXJENCE CHARACTERISTICS: 

(A) LENSTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CXIMA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(Xi) SEQUENCE DESCRIPTIOM SEQ ID NO: 22: 
GTCCACTTGG ATCCGTTCAC TGG 23 
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WE CLAJM: 

1. A siibstantially purified nucleic acid encoding 
the amino acid sequence of HCAVIII depicted in SEQ ID 
NO: 2. 

2. The nucleic acid of Claim 1 wherein said nucleic 
acid is mRNA. 
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3. A cDNA encoding the amino acid sequence of 
HCAVIII or a portion thereof. 

4. The cDNA of Claim 3 wherein the amino acid 
sequence is encoded by the coding region of the nucleotide 
sequence depicted in SEQ ID N0:1. 

5. The cDNA of Claim 3 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 2. 

6. The cDNA of Claim 3 wherein the amino acid 
sequence is encoded by the coding region of the nucleotide 
sequence depicted in SEQ ID NO: 3. 

7. The cDNA of Claim 3 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 4. 

8 . The cDNA of Claim 3 wherein the amino acid 
sequence is encoded by the nucleotide sequence depicted in 
SEQ ID NO: 12. 

9. The cDNA of Claim 3 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 13. 

10. The cDNA of Claim 3 wherein the amino acid 
sequence is encoded by the nucleotide sequence depicted in 
SEQ ID NO: 14. 



11. The cDNA of Claim 3 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 15. 
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12. The cDNA of Claim 3 comprising the nucleotide 
sequence depicted in SEQ ID NO: 5. 

13. The cDNA of Claim 3 comprising the nucleotide 
sequences depicted in SEQ ID NO: 5 and SEQ ID NO: 7. 
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14. A cDNA encoding the amino acid sequence of 
HCAVIII wherein the phosphorylation region has been 
mutated . 

15, The cDNA of Claim 14 wherein the amino acid 
sequence is encoded by the nucleic acid sequence depicted 
in SEQ ID NO: 17. 



16. The cDNA of Claim 14 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 18. 
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17. A protein comprising the amino acid sequence of 
HCAVIII or a portion thereof. 

18. The protein of Claim 17 wherein the cimino acid 
sequence is encoded by the coding region of the nucleic 
acid sequence depicted in SEQ ID N0:1. 

19. The protein of Claim 17 wherein the amino acid 
sequence comprises the sequence depicted m SEQ ID NO: 2. 

20. The protein of Claim 17 wherein the amino acid 
sequence is encoded by the coding region of the nucleic 
acid sequence depicted in SEQ ID NO: 3. 

21. The protein of Claim 17 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 4. 

22. The protein of Claim 17 wherein the amino acid 
sequence is encoded by the coding region of the nucleic 
acid sequence depicted in SEQ ID NO: 12. 

23. The protein of Claim 17 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 13. 

24. The protein of Claim 17 wherein the amino acid 
sequence is encoded by the coding region of the nucleic 
acid sequence depicted in SEQ ID NO: 14. 

25. The protein of Claim 17 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 15. 
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26. A protein comprising the amino acid sequence of 
HCAVIII wherein the phosphorylation region has been 
mutated . 

27. The protein of claim 26 wherein the amino acid 
sequence is encoded by the nucleic acid sequence depicted 
in SEQ ID NO: 17. 

28. The protein of Claim 26 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 18. 
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29. A recombinant DNA clone comprising a cDNA of a 
HCAVril transcript isolatable from human A549 cells of 
about 1.1 kilobases. 
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30. An expression vector comprising the nucleic 
sequence for HCAVIII or a portion thereof. 

31. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the coding region of the 
nucleotide sequence depicted in SEQ ID N0:1. 

32. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the coding region of the 
nucleotide sequence depicted in SEQ ID NO: 3 

33. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the coding region of the 
nucleotide sequence depicted in SEQ ID NO: 12. 

34. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the coding region of the 
nucleotide sequence depicted in SEQ ID NO: 14. 

35. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the nucleotide sequence 
depicted in SEQ ID NO: 17. 



wo 96/02552 



PCrAJS95/09145 



72 

5 36. A method of detecting cancerous and precancerous 

lung tissue comprising: 

(a) preparing a section of biopsy tissue; 

(b) probing said tissue with a labeled probe 
complementary to the cDNA of SEQ ID N0:1; 

10 (c) removing said probe which has not hybridized to 

the tissue; and 

(d) detecting the presence of the hybridized probe. 
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37. A method for detecting lung cancer antigen 
specific for non-small cell carcinoma in a human cell 
specimen comprising: 

a) labeling a DNA probe comprising the genomic clone 
of HCAVIII; 

b) reacting the labeled DNA probe with a human test 
cell specimen and a normal human cell specimen under 
conditions suitable for hybridization of the labeled probe 
to any HCAVIII mRNA which may be present in the test and 
normal cell specimen; 

c) removing unreacted components from the test and 
said normal cell specimens; 

d) detecting the hybridized probe bound to the test 
and normal cell specimens; 

e) quantifying and comparing the amount of hybridized 
probe bound to the test and normal cell specimens. 

38. The method of claim 37 further comprising: 

a) labeling a DNA probe comprising the genomic clone 
of HCAVIII with a substrate which can bind to a detecting 
substance to form a labeled DNA probe; 

b) reacting the labeled DNA probe with a human test 
cell specimen and a normal human cell specimen under 
conditions suitable for hybridization of the labeled probe 
to any HCAVIII mRNA which may be present in the test and 
normal cell specimens; 

c) removing unreacted components from the test and 
normal cell specimens; 

d) reacting the test and normal cell specimens with 
a detecting substance which is capable of fluorescing; 

e) comparing the fluorescence of the test and normal 
cell specimens. 
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39, A method for screening human specimens for 
HCAVIII protein, comprising: 

a) mixing a human test specimen with a first amount 
of an antibody specific for the HCAVIII protein in a first 
5 reaction well; 

b> mixing a control lung cancer antigen comprising 
at least a portion of the HCAVIII protein with a second 
amount of said antibody specific for the HCAVIII protein 
in a second reaction well; and 
10 c) detecting whether said test specimen binds to 

said antibody as compared to said control lung cancer 
antigen . 
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40. A method for testing a human cell sample for 
lung cancer comprising assaying a cell homogenate for 
carbonic anhydrase activity. 
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41. An antibody made by immunizing animals with a 
lung cancer antigen associated with non-small cell lung 
cancer cells. 

42. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 2. 

43. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 4. 

44. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO : 1 3 . 

45. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID N0:15. 

46. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID N0:18. 
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41. A therapeutic composition for the treatment of 
non-small cell lung cancer comprising an antibody to 
HCAVIII protein bound to a substance which affects the 
ability of said cancer to replicate. 

48. The method of claim 47 wherein said substance is 
a cancer drug. 

49. The method of claim 48 wherein said substance is 
a radioisotope. 

50. The method of claim 49 wherein said substance 
affects gene expression of a gene encoding HCAVIII. 
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51. A substantially purified nucleic acid comprising 
the nucleotide sequence depicted in SEQ ID NO: 7. 
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52. A cDNA comprising the nucleotide sequence 
depicted m SEQ ID NO: 7. 
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53. A substantially purified nucleic acid comprising 
the nucleotide sequence depicted in SEQ ID NO: 21. 
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AMENDED CLAIMS 



[received by the International Bureau on 20 November 1995 (20.11.95); 
original claim 41 amended; remaining claims unchanged (1 pageJJ 



41. An ouitibody made by inununizing animals with 
HCAVIII, a lung csincer antigen associated with non- small 
cell lung cauicer cells. 

42. The antibody of Claim 41 wherein said lung 
cemcer antigen has the amino acid sequence depicted in SEQ 
ID NO:2. 

43 . The amtibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO:4. 



44. The ouitibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 13. 

45. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 15. 



46. The euntibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 18. 



AMENDED SHEET (ARTICLE 19) 
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